The big thing that makes javascript slow (that the optimizer can’t really fix) i...

criticaltinker · on April 20, 2022

Adding to this great comment with my own experience at work where we extensively use signal processing and time series compression algorithms to visualize large amounts of biological signals in the browser.

WebAssembly is less than 10%-20% slower than native code in our benchmarks and tests for algorithms like fast wavelet transform and bit packing.

WA allows us to aggressively optimize ahead of time when compiling, and be less sensitive to JS engine performance pitfalls around JIT optimization and garbage collection.

pistachiopro · on April 20, 2022

Throwing in my data: for physics simulation code, WASM vs native was identical, provided the native code was compiled with automatic SIMD optimizations disabled. On the one hand, that speaks well of the idea of web assembly, on the other, disabling SIMD is highly artificial. It's good to see basic SIMD being made standard, though the 256-bit wide SIMD instructions are still being finalized, and those will be necessary to really have a chance at evening out performance vs. native.

alephnan · on April 20, 2022

> Javascript has the same problem with b-trees, skip lists, ropes, gap buffers, AABB trees, and so on. You can implement all of this stuff in javascript - you just end up with much more memory indirection than you want. And for trees with different internal nodes and leaves, your code will give the optimizer indigestion.

I'm building a text editor right now. I'm just using a giant array with one element for each line. I anticipated that it would get slow, and that potentially I can push this into WebAssembly.

The problem is that you're going to need to get the data back out of WASM eventually, back into JavaScript and then back to the DOM.

I focused on other optimizations and strangely my text editor is able to keep 9 million loC file ( over 500mb ) in memory. The trick is to not render everything to the DOM, just the stuff people are looking at. That's the other bottleneck. V8 is surprisingly smart. Random access for a small section of that 9 million element array is still fast. You'd expect this array to be non linear, too. Moreover, even localized shifts are fast. I'm puzzled. For comparison, VSCode uses a more esoteric data structure called a PieceTree but for me becomes unresponsive at 500k LoC. To be fair, they have other sorts of overhead and processing layers to deal with, namely building an AST with potentially thousands of layer of hierarchical / code folding. Speaking of which, VSCode is built entirely in TypeScript and has a reputation of being a fast and snappy editor.

At any rate, I don't buy the argument that because Rust is fast and WebAssembly is comparably fast, then it makes sense to push your data structure, especially a text buffer, down into WebAssembly. The bottleneck is serializing data back and forth between WebAssembly and JavaScript. For text editors, I'm not sure this overhead is worth it, especially when V8 / Chrome is doing some crazy JS optimization behind the scenes. The appealing bit with WebAssembly is potentially that objects will have a smaller memory footprint, therefore we can push further than this 9 million LoC limit where the Chrome tab just runs out of memory, rather than having algorithmic slowdown.

mejutoco · on April 20, 2022

> The problem is that you're going to need to get the data back out of WASM eventually, back into JavaScript and then back to the DOM.

Could you paint everything in a canvas element to avoid the DOM? That is being my approach in a small project, although it admitedly can complicate things.

josephg · on April 20, 2022

Don't do that. There's all sorts of very complex OS-level UX controls which hook into a text box. If you implement your own text box using canvas, all this stuff will be broken by default.

For example:

- Ctrl+Left / right for moving a word at a time. This is Alt+Left/right on a mac I think. There's another keyboard shortcut to go to the start / end of a line, or the start / end of the input element. These shortcuts are OS-specific and can be overridden in the user's OS level settings - which you don't have access to from javascript. There's a mountain of shortcuts - like Ctrl+A, Ctrl+X/C/V, etc. They all vary per OS. You have to implement them correctly on every operating system.

- International character input methods. Eg, how would you type Japanese / Korean / Chinese characters into your text element?

- Undo/redo support

- Text selection via the mouse, the keyboard and on mobile by touch-dragging to select.

- Accessibility - like voiceover and dictation.

No matter how dedicated you are, I guarantee your custom canvas element will never correctly re-implement all the features already available in a simple HTML text input element.

mejutoco · on April 20, 2022

I am aware of the tradeoffs and generally I agree. That is what I meant by it can overcomplicate things. It is all tradeoffs, and for some usages it might be a valid one to remove the dependency on the DOM.

I see it as an option for a heavy app in the browser. This is AFAIK similar to what Figma does (or flutter). Also, I am not referring specifically to text editors, although I could imagine something like vim working with this approach in the browser without trying to replicate an HTML text input element.

slight_glitch · on April 20, 2022

Google Docs does that and it works. Not perfect, but good enough.

josephg · on April 20, 2022

> At any rate, I don't buy the argument that because Rust is fast and WebAssembly is comparably fast, then it makes sense to push your data structure, especially a text buffer, down into WebAssembly.

To store a buffer in a plain text editor, with no syntax highlighting? I'm not going to fight you on that. I'm glad pure javascript is fast enough for your needs.

But doing collaborative text editing with a CRDT adds some extra requirements that your text editor's buffer might not have. For example:

- Hydrating the document state on load. To cut down on file size, diamond types files currently just store the entire editing history. When you open them up, we replay the whole editing history into a buffer. And using Wasm + jumprope[1], this is <1ms even for very large documents. The same would not be true in javascript, and I might need to use bigger files.

- Merging concurrent changes. (Like, merging a long lived branch). This currently involves getting fancy with a B-tree. This is fast enough in rust. It would be orders of magnitude slower in javascript.

- Undo support. When you hit undo, you need to ignore any more recent typing from other users and just undo the local user's last change. This is subtle.

- When edits happen, we need to convert between your (line, JS column) tuple and whatever the CRDT uses internally. Diamond types specifies edit positions by counting unicode characters from the start of the document. So we need to convert between (line,col) and unicode offsets whenever local or remote changes happen to broadcast or apply the changes.

Rust + Wasm let me make all of these operations essentially instant. I can do that because I can use an appropriate data structure & algorithm for each operation.

And, sure, Javascript's Array is pretty fast. But if you ever need something more complex than Array, your performance will suffer massively. Personally I feel much better writing this code in rust + wasm.

I'd integrate diamond types with your text editor by just passing edits back and forth across the module boundary. I agree that it probably doesn't make sense to share a text buffer between javascript and wasm.

[1] https://crates.io/crates/jumprope

alephnan · on April 20, 2022

> I'm glad pure javascript is fast enough for your needs.

In this case, the bottleneck at 9 million LoC is not CPU cycles but memory usage. That's where I am considering pushing down into WebAssembly, but whatever function-inlining optimization V8 is doing, it is probably still faster than the overhead of JS<->WebAssembly function call. I have no doubt WebAssembly will edge out again.

> Merging concurrent changes. (Like, merging a long lived branch). This currently involves getting fancy with a B-tree. This is fast enough in rust. It would be orders of magnitude slower in javascript.

I guess my point is why do you need balanced trees? Is this a CRDT specific thing? Can you implement CRDT with just an array of lines / gap buffer?

> Undo support. When you hit undo, you need to ignore any more recent typing from other users and just undo the local user's last change. This is subtle.

From my understanding, the whole point of CDRTs or any command-based design is express changes to the state as the delta. In which case, you only have to stack / remember the set of commands and not have to store the state on every change. I'm not sure if this overlaps with the data structure choice, other than implementation details.

> And, sure, Javascript's Array is pretty fast. But if you ever need something more complex than Array, your performance will suffer massively. Personally I feel much better writing this code in rust + wasm.

I was just looking into TypedArrays. From my understanding, the JS runtime will use a HashMap or DoublyLinkedList if the array elements are homogenous. In the case where the JS runtimes know it is homogenous, they will fallback to those TypedArray / fixed length buffers that are "continuous" in memory. So on the JS side, Arrays can be as fast as TypedArrays thanks to the compiler. Consequently, in some benchmarks they will effectively be the same.

The question is whether the speed of a native array is faster than the speed of a TypedArray such that it pays for the WASM<->JS overhead. And I guess this depends on the application and the access patterns of that interop.

josephg · on April 21, 2022

> In this case, the bottleneck at 9 million LoC is not CPU cycles but memory usage. That's where I am considering pushing down into WebAssembly

How often does this come up in practice? I can't think of many files I've opened which were 9 million lines long. And you say "LoC" (lines of code). Are you doing syntax highlighting on 9 million lines of source code in javascript? Thats impressive!

> I guess my point is why do you need balanced trees? Is this a CRDT specific thing? Can you implement CRDT with just an array of lines / gap buffer?

Of course! Its just going to be slower. I made a simple reference implementation of Yjs, Automerge and Sync9's list types in javascript here[1]. This code is not optimized, and it takes 30 seconds to process an editing trace that diamond types (in native rust) takes 0.01 seconds to process. We could speed that up - yjs does the same thing in 1 second. But I don't think javascript will ever run as fast as optimized rust code.

The b-tree in diamond types is used for merging. If you're merging 2 branches, we need to map insert locations from the incoming branch into positions in the target (merged) branch. As items are inserted, the mapping changes dynamically. The benchmark I've been using for this is how long it takes to replay (and re-merge) all the changes in the most edited file in the nodejs git repository. That file has just shy of 1M single character insert / delete operations. If you're curious, the causal graph of changes looks like this[2].

Currently it takes 250ms to re-merge the entire causal graph. This is much slower than I'd like, but we can cache the merged positions in about 4kb on disk or something so we only need to do it once. I also want to replace the b-tree with a skip list. I think that'll make the code faster and smaller.

A gap buffer in javascript might work ok... if you're keen, I'd love to see that benchmark. The code to port is here: [3]

> Undo support -> In which case, you only have to stack / remember the set of commands and not have to store the state on every change. I'm not sure if this overlaps with the data structure choice, other than implementation details.

Yeah, I basically never store a snapshot of the state. Not on every change. Not really at all. Everything involves sending around patches. But you can't just roll back the changes when you undo.

Eg: I type "aaa" at position 0 (the start of the document). You type "bbb" at the start of the document. The document is now "bbbaaa". I hit undo. What should happen? Surely, we delete the "aaa" - now at position 3.

Translating from position 0 to position 3 is essentially the same algorithm we need to run in order to merge.

> I was just looking into TypedArrays.

I tried optimizing a physics library a few years ago by putting everything in typedarrays and it was weirdly slower than using raw javascript arrays. I have no idea why - but maybe thats fixed now.

TypedArrays are useful, but they're no panacea. You could probably write a custom b-tree on top of a typedarray in javascript if you really want to - assuming your data also fits into typedarrays. But at that point you may as well just use wasm. It'll be way faster and more ergonomic.

[1] https://github.com/josephg/reference-crdts

[2] https://home.seph.codes/public/node_graph.svg

[3] https://github.com/josephg/diamond-types/tree/master/src/lis...

pattrn · on April 20, 2022

What are your thoughts on the Javascript YJS CRDT implementation? I'm admittedly pretty new to the space, so I don't know much about performance comparisons. Haven't noticed any significant performance problems with it (using it for collaborative editing on ~100,000 word documents), but would love hearing what you think.

josephg · on April 20, 2022

Eh. Its fine. I think its the best CRDT out there right now (but Kevin had a few years' head start on me!). I haven't done a lot with yjs personally. And I don't agree with a few of Kevin Jahns's design decisions, but none of them are deal-breakers:

- The way deleted data is encoded in yjs is awkward. Versions store a list of which content has been deleted. So versions are bigger than they need to be, and you can't replay the editing history without stored snapshots. I think the knowledge of which elements have been deleted should just be saved into the binary format alongside the inserts.

- For javascript objects, yjs pretends each value is actually a list (with everything after the first element ignored) so it can reuse the logic for list editing. This is overcomplicated compared to Shelf or something like it.

- Yjs's internal structure is a linked list with cached offsets. This works ok in practice because inserts usually appear close to each other, but it means yjs has O(n^2) performance to process n edits rather than O(n log n). Diamond types (via wasm) is about 30x faster as a result - though as you say, yjs is still plenty fast enough in practice for most applications. And this gap will narrow once Yrs is working well.

- Yjs can have an interleaving problem when concurrent items are prepended at the same location in a list.

- The current (new) diamond types file format stores the origin-position + parents information instead of storing left-origin / right-origin fields. This lets DT have a smaller file size while storing more information than yjs stores. And I think this format is easier for applications to encode and work with, and it will make pruning easier when we get to it. We should also be able to load & save faster than yjs - but I'm not benchmarking that yet.

But all of this stuff is pretty minor. Yjs works well in practice, and its quite mature. Performance will only get better with the yrs rust implementation. For diamond types, I had the benefit of being able to copy some fantastic implementation ideas from Yjs. If diamond types is better than yjs, its because I'm standing on its shoulders.