It’s not inevitable. It’s just a specific form of leadership failure that comes from grabby short term business people being in charge of everything.
As a counter example, the Economist or the NY Times don’t feel anywhere near this bad. They make you pay. But subscribers get a much nicer product in return.
Part time CRDT researcher here. I think CRDTs would work great for google docs. Google docs has a problem where if too many people open the document at the same time, it needs to lock the document. I assume this is because the document is "owned" by a single computer. Or they use DB-transactional ordering on edits.
If I implemented google docs today, I'd use a CRDT between all the servers. This would let you have multi-master server scaling, and everything on the server side would be lock-free.
Between the server and the client, CRDTs would be fine. With eg-walker, you can just send the current document state to the user with no history (until its actually needed). In eg-walker, you can merge any remote change so long as you have history going back to a common fork point. If merges happen and the client is missing history, just have them fetch the server for historical data.
Alternately, you can bridge from a CRDT to OT for the client/server comms. The client side code is a bit simpler that way, but clients may need server affinity - which would complicate your server setup.
Early versions of google docs didn't even implement OT correctly. There were various cases where if two people bolded and unbolded text in a certain order, while connecting and disconnecting their wifi, the documents would go out of sync. You'd end up looking at different documents, even after everything reconnected.
I (obviously) care a lot about fixing these sort of bugs. But its important to remember that in many applications, it matters a lot less than people think.
> In general, you don't really get to compact tombstones meaningfully without consensus so you really are pushing at least remnants of the entire log around to each client indefinitely.
This is often much less of a problem in practice than people think. Git repositories also grow without bound, but nobody seems to really notice or care. For diamond types (my library), I lz4 compress the text content itself within the .dt files. The metadata is so small that in many cases the space savings from lz4 compression makes the resulting .dt file - including full change history - smaller than the document on disk.
If anyone really cares about this problem, there are a couple other approaches:
1. In our Eg-walker algorithm, we don't use stable guids to identify insert positions. Thats where other algorithms go wrong, because they need to store these guids indefinitely in case someone references them. For eg-walker, we just store relative positions. This makes the algorithm work more like git, where you can do shallow clones. And everything works, until you want to merge a branch which was split off earlier than your clone point. Then you should be able to download the earlier edits back to the common branch point and merge. To support merging, you also only need to store the metadata of earlier edits. You don't need the inserted content. The metadata is usually tiny (~1-4 bits per keystroke for text is common with compression.)
2. Mike Toomim's Antimatter algorithm is a distributed consensus protocol which can detect when its safe to purge the metadata for old changes. It works even in the presence of partitions in the network. (Its conservative by default - if a partition happens, the network will keep metadata around until that peer comes back, or until some timeout.)
> The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.
Alex Good is another. He's behind a lot of the huge improvements to automerge over the last few years. He's wicked smart.
The man himself. Yeah agreed you guys have solved it. It is more a misfired crud dev instinct for me that sees it as wasteful. Just a different paradigm and not big in practice.
I've got eg-walker & Diamond Types in my reading/youtube backlog. Diamond Types went further down the backlog because of "wip" on the repo! I will look into Antimatter too.
> It is more a misfired crud dev instinct for me that sees it as wasteful.
You're not alone. Lots of people bring this up. And it can be a real problem for some applications, if they store ephemeral data (like mouse cursor positions) within the CRDT itself.
That's partly because repositories rarely need to be cloned in their entirety. As such, even when you need to do it and it's a couple hundreds mb taking a few minutes, it's tolerated.
In situations where a document needs to be cold loaded often, the size of the document is felt more acutely. Figma has a notion of GC-ing tombstones. But the tombstones in questions aren't even something that get created in regular editing, it happens in a much more narrow case having to do with local copies of shared components. Even that caused problems for a small portion of files -- but if a file got that large, it was also likely to be important.
> even when you need to do it and it's a couple hundreds mb taking a few minutes, it's tolerated.
Well written CRDTs should grow slower than git repositories.
> In situations where a document needs to be cold loaded often, the size of the document is felt more acutely.
With eg-walker (and similar algorithms) you can usually just load and store the native document snapshot at the current point-in-time, and work with that. And by that I mean, just the actual json or text or whatever that the application is interacting with.
Many current apps will first download an entire automerge document (with full history), then using the automerge API to fetch the data. Instead, using eg-walker and similar approaches, you can just download 2 things:
- Current state (eg a single string for a text file, or raw JSON for other kinds of data) alongside the current version
- History. This can be as detailed & go as far back as you want. If you download the full history (with data), you can reconstruct the document at any point in time. If you only download the metadata, you can't go back in time. But you can merge changes. You need history to merge changes. But you only need access to history metadata, and only as far back as the fork point with what you're merging.
If you're working online (eg figma), you can just download history lazily.
For client/server editing, in most cases you don't need any history at all. You can just fetch the current document snapshot (eg a text or json object). And users can start editing immediately. It only gets fancy when you try to merge concurrent changes. But thats quite rare in practice.
> If you're working online (eg figma), you can just download history lazily.
You can download the history lazily, that's a special case of incrementally loading the document.
If the history is only history, then sure. But I was understanding that we were talking about something that may need to be referenced but can eventually be GCed (e.g. tombstones-style). Then lazy loading makes user operations that need to reference that data go from synchronous operations to asynchronous operations. This is a massive jump in complexity. Incrementally loading data is not the hard part. The hard part is how product features need to be built on the assumption that the data is incrementally loaded.
Something that not all collaborative apps will care about, but Figma certainly did, is the concept that the client may go offline with unsynced changes for an indefinite amount of time. So the tombstones may need to be referenced by new-looking old changes, which increases the likelihood of hitting "almost-never" edge cases by quite a bit.
Yeah this is the way. You will lightly bother some people by being talkative. But it’s ok. So long as you’re sensitive to their desire not to talk, you’ll be fine. Nobody will murder you in the night or kick you out of the village.
I live in an apartment (condo). I’ve been practicing making small talk with people in the elevator. The conversations aren’t all winners. Lots of people are closed off or don’t want to chat. But no matter. Elevator conversations are disposable. And most people are genuinely lovely. It’s a fun challenge trying to brighten the days of strangers.
Part of the problem is that each generation of designers want to leave their mark on the product - often by undoing the work of the last generation of designers. They're not entirely wrong. Design has fashions, like clothes. I enjoy that the industrial design of laptops and phones changes every few years. But good UX isn't good because its fashionable. Good UX doesn't go out of date. They've gotta learn to stop fixing it when its not broken.
Eg, MacOS's new system preferences panel is worse than the old one. And its stupid putting the windows start menu in the middle of the screen, where you can't as easily click it with the mouse.
We're a new industry. So long as we keep iterating on our tools, this will continue to happen. Obsolescence is - in this case - an indicator of progress.
I don’t subscribe to the notion that we are „new industry”.
It already is well past 80 years and we can easily add computation jobs and record keeping that were there before those were digitalized.
„Centuries” of experience in other fields for me feels like it is exactly the case of that guy who has 20 years of experience in his CV in software development but can’t write fizz buzz if you ask for it.
There is so much knowledge lost and new guys don’t study centuries of history to build a house or become a sales person. You might study battles from century ago but they are mostly irrelevant.
Rust doesn't eliminate all bugs. But anecdotally, by the time the type checker and borrow checker have humbled me, my programs really do often work the first time I run them. Its quite remarkable.
This isn't a special thing about rust. All languages are on a spectrum of "detect all bugs statically" to "detect all bugs dynamically". Rust programs run correctly "first time" more than javascript, more than typescript. But still less than haskell.
You can still write bugs in rust, obviously. I've written plenty. As you say, so has cloudflare. But strong typing does find a lot of bugs in practice.
As a counter example, the Economist or the NY Times don’t feel anywhere near this bad. They make you pay. But subscribers get a much nicer product in return.
reply