Loro's rich text CRDT

207 points by czx111331 2 years ago | 47 comments

jitl 2 years ago |

I'm curious about this line describing REG:

> The REG algorithm excels with its fast local update speeds and eliminate concerns about tombstone collection in CRDTs. For instance, if an operation has been synchronized across all endpoints, no new operations will occur concurrently with it, allowing it to be safely removed from the history.

If you remove these ops from history, does that remove the ability to time travel (per the home page "An antidote to regret, enabling historical edits traversal") or merge branches? How can we be sure an operation is synchronized?

If dropping these ops is necessary for speed/storage optimization but disables time-travel, is it possible to put the removed historical/tombstone ops into a "cold storage" that's optional and only loaded for time-travel use?

josephg 2 years ago | |

Hi! I invented replayable event graphs. I'm writing a paper at the moment about it, which hopefully should be out in a month or so. Send me a private email and I can mail you the current draft if you like.

> If you remove these ops from history, does that remove the ability to time travel

Yes it does. You also need the ops from history to be able to merge changes. You can only merge changes so long as you have the operations going back to the point at which the fork happened.

> is it possible to put the removed historical/tombstone ops into a "cold storage" that's optional and only loaded for time-travel use?

Absolutely. And this is very practically useful. For example, you could have a web page which loads the current state of a document (just a string. Unlike CRDTs, it needs no additional metadata!). Then if some merge happens while you have the document open, the browser could just fetch the operations from the server back as far as it needs to be able to merge. But in normal operation, none of the historical operations need to be loaded at all.

All this said, with text documents the overhead of just keeping the historical operations is pretty tiny anyway. In my testing using diamond types (same algorithm, different library), storing the entire set of historical operations usually increases the file size by less than 50% compared to just storing the final text string. Its much more efficient on disk than git, and more efficient than other CRDTs like automerge and Yjs. So I think most of the time its easier to just keep the history around and not worry about the complexity.

jacquesm 2 years ago | | |

Is this a special case of operational transforms, vv or something else entirely?

czx111331 2 years ago | |

> If you remove these ops from history, does that remove the ability to time travel (per the home page "An antidote to regret, enabling historical edits traversal") or merge branches?

Yes. But squash can be supported.

> How can we be sure an operation is synchronized?

In Loro, we not only record the real-world timestamp efficiently, similar to Git, but also capture the DAG information. This approach ensures that if an operation (op) is particularly old, it will have many other ops depending on it. By utilizing both pieces of information, we can determine the operations that are likely synced across all peers. For peers like servers, it's feasible to preserve all operations. However, we can remove some operations in scenarios such as opening the document online for the first time.

> If dropping these ops is necessary for speed/storage optimization but disables time-travel, is it possible to put the removed historical/tombstone ops into a "cold storage" that's optional and only loaded for time-travel use?

Yes. This is not supported at the moment, but we hope to implement it before version 1.0.

kiitos 2 years ago | |

> if an operation has been synchronized across all endpoints, no new operations will occur concurrently with it, allowing it to be safely removed from the history.

This assumes that the set of endpoints (really, nodes) is both well-known by all other nodes in the network, and stable over time (meaning new nodes will never be added).

Even if this assumption can be made safely (which is not a given) the GC process described here is still an optimization, which would be subverted when even a single node in the network became slow or broken.

It's also basically orthogonal to the concept of "tombstones", which are still required if you want to delete anything from the data structure.

czx111331 2 years ago | | |

Similar to OT, in certain scenarios, it's sufficient to ensure that only a subset of peers have the complete data, while others don't need the full history. For instance, in real-time collaboration scenarios with a central server, we can, just like OT, allow clients to hold only a shallow clone instead of the complete history. This approach results in minimal overhead for the clients.

lewisjoe 2 years ago |

It's great work improvising over Peritext using joseph's latest CRDT work. Much needed literature in "applying CRDTs for richtext" space.

But I'm surprised why this one too hasn't focussed a lot on rich-text block elements (like lists, tables & sections) as much as it focussed on text attributes (like bold and italics).

mikebelanger 2 years ago |

Looks neat! Would there be a way of intercepting state and making 'snapshots' into a more traditional format, like SQL, or even a JSON file?

It sounds like this defaults to the server storing the whole state in their binary format, ditto the client-side portion of it. Nothing wrong with the format, but this is an early project, and nobody wants their data in something that's potentially unstable, or something that might get corrupted.

czx111331 2 years ago | |

We are carefully stabilizing our encoding format and will have a clear storage format documentation introduced in version 1.0. I agree that a more transparent format can provide users with a better sense of control, and we will try to create a human-readable format for exporting CRDT data (the kind that includes operation history). As for the application state, Loro already supports direct export in json format.

mikebelanger 2 years ago | | |

Ok, I appreciate you reaching back. And my bad, I didn't see that it already supports direct export of JSON.

CodeGroyper 2 years ago | |

Supposedly you can always make a deep copy or a backup of anything you have.

rubymamis 2 years ago |

Slightly off-topic - I don't think real-time collaboration is suitable for text-based formats. I believe collaboration similar to working with git is superior:

1. Fork the text

2. Submit proposal

3. Review

4. Merge/Cancel

EDIT: To slightly expand on this - there are many reasons for this intuition - the main, IMO, is that people like to work on text privately before showing it to people. Also, the mental fear of your text interrupted by someone else. There might be even more reasons.

erlend_sh 2 years ago |

Would be nice to see cola included in the benchmarks: https://nomad.foo/blog/cola

NeutralForest 2 years ago |

Looks dope, could be nice for collaborative writing like a multi-author blog post or for docs.

doublerabbit 2 years ago |

Hmm, not helpful. I'm on iOS, so there is no console.

> Application error: a client-side exception has occurred (see the browser console for more information).