Loro: Reimagine state management with CRDTs

jitl 2 years ago |

The demo code for Loro looks very easy to use, I love how they infer a CRDT from an example plain JS object. I’ve played with a Zod schema <-> Yjs CRDT translator and found it kinda annoying to maintain. However this looks so easy I worry about apps building with too little thought about long term data modeling. Migrations on CRDTs are challenging, so it’s important to “get it right” at the beginning. I’m curious how this design works with longer term, more complex CRDT apps.

czx111331 2 years ago | |

The code in the blog is from Vue Pinia, a state management library, and not from Loro. It serves as an example demonstrating that CRDTs can be modeled similarly. Thus, you might expect to use Loro in a similar way.

Indeed, schema migration is challenging. There is a lot to explore, both in reducing the need for migration and in ensuring a smooth transition when it's required. We plan to tackle this issue in the future.

jayunit 2 years ago | |

If you're looking at declaring schemas for CRDT docs in Yjs, I've found https://syncedstore.org/docs/ productive if you're open to using TypeScript.

Agreed on migrations!

vosper 2 years ago | |

> Migrations on CRDTs are challenging, so it’s important to “get it right” at the beginning.

Any tips or dos/don'ts you could share for how to "get it right" at the beginning? I'm hoping to build an application using CRDTs sometime in the future.

jitl 2 years ago | | |

I don’t have anything better for you than “be careful” and “prototype a lot”. Adding new fields is fine, changing the meaning or type of existing fields is annoying.

You should try to brainstorm a lot of future feature improvements and ideas you might want in 2 years and consider if your v1 design would block or need a challenging migration to support the hypothetical v100 designing. A classic example is if you need a one-to-one relationship in v1, think about how you’d support that as a one-to-many or many-to-many. It might be better to start with a schema that supports many-to-many to avoid shenanigans down the line.

I linked the Cambria essay below which has a lot of food for thought.

enva2712 2 years ago | | |

I don’t see why migrations on CRDTs would be categorically different from migrations on other data types, though maybe there’s less open source tooling to leverage at the moment. I’ve got some basic code written on automerge to handle them in a side project

jitl 2 years ago | | |

It’s more challenging because you need to accept updates in format v1 indefinitely even after you apply the v1->v2 migration, which makes it more like maintaining an API with backwards compatibility than the usual SQL migration pattern. For example, let’s say you start with a last-write-wins CRDT for an object of shape { name: string, bio: string } in v1, and then decide bio should use a convergent rich text data type like Peritext. If you do a naive migration and change the type of the field in-place, how do you handle updates from old peers doing a LWW set on bio, when now the data type you expect is a Peritext delta?

And if you’re really peer to peer, you’ll need older peers that don’t understand a new format in their user space software to avoid applying updates that break compatibility for them.

Geoffrey Litt documents the challenges in Cambria, see https://www.inkandswitch.com/cambria/

enva2712 2 years ago | | |

The lens solution is clever and, I’m glad to see lenses catching on outside the haskell ecosystem

I think the primary issue you raise comes down to whether or not the peers can be verified to be updated before applying the migration op. If this is the case, you can go the standard sql route of deploying a diff adding forward compatibility, applying migration op, deploying diff removing backward compatibility. This is the route I’ve gone since in my case the peers are web browsers so I’ve got a pretty reliable deployment system; though the lens solution is much better since it lifts this requirement

bzmrgonz 2 years ago |

This is amazing. Please share with the big projects which need it the most.. collabora and libreoffice. Also, a product which the world needs badly.. would be a software which would abstract git and present text to lawyers as regular word processor, but in the backend it's git for the win.

rapnie 2 years ago | |

> Also, a product which the world needs badly.. would be a software which would abstract git and present text to lawyers as regular word processor, but in the backend it's git for the win.

Ink & Switch Upwelling [0] goes into that direction. A must-watch is the StrangeLoop 2023 talk by Martin Kleppmann "New Algorithms for Collaborative Editing" [1] that excellently explains things.

[0] https://www.inkandswitch.com/upwelling/

[1] https://yewtu.be/watch?v=Mr0a5KyD6BU

singhrac 2 years ago |

This looks really neat. I appreciate that you reference the previous work in this area (by josephg, Ink & Switch, Fugue etc.).

I think the roadmap says that WASM is next as a target, and that makes sense for prioritization. Would you also consider Flutter/Dart as a target, even if at the level of "we checked once that flutter_rust_bridge can call Loro"?

aatd86 2 years ago |

Can someone explain to me what happens when there is a destructive update on one side while the other side is still relying on some old version?

Can this even be reconciliated?

Or is it append only?i.e. No delete operation.

UIs have delete operations in general.

ablob 2 years ago | |

You can implement delete operations by marking fragments as deleted. That way you can still refer to them from a positional perspective, but they will eventually disappear. The exact behavior depends on the implementation, the only requirement is that every "user" reaches the same state eventually.

  A: delete Line 4@marker
  B: append to the end of line 4@marker "abcd"

could then result in either line 4 with only "abcd", or "abcd" at the beginning of line 5 (as long as both sides resolve to the same state). You're right in that this is tricky to get right, as that is inherent to the complexity of asynchonous mutation from multiple sources.

aatd86 2 years ago | | |

I guess my question is of whether the state that is being reached is a legit one in this case.

What is the source of truth eventually?

I think there must probably be a hierarchy that decides it. It's probably a kind of race condition/byzantine general problem.

two_handfuls 2 years ago | | |

These systems are based on the idea is that rather than directly editing the shared document, each program sends a stream of “update objects.” An example would be “create new line $xyz after line $abc.”

These “update objects” can be combined to get the current document state.

They have the property that if two programs receive the same set of uodate objects, regardless of the order they get them in, then they have the same current document state.

They define any state resulting from applying the operations as “legit.” In order for that to feel “legit” to the users you want to very carefully choose your operations and their semantics.

lann 2 years ago | | |

The answer depends on the specific CRDT algorithm in use. For complex data structures like the ones behind collaborative text editing your intuition that the updates end up looking hierarchical is generally correct.

allenu 2 years ago | |

I don’t know about the link’s strategy but I’ve implemented something CRDT-like and for object deletion, I’ve had to use tombstones. (In my case I timestamp the tombstone state in case I want to bring the object back).

rudasn 2 years ago |

The performance of this looks really interesting, looking at the demo gif they have on the page.

I wonder if this is something that can be used for versioning database columns / fields.

hugodutka 2 years ago |

We've been using https://github.com/electric-sql/electric for real-time sync for the past month or so and it's been great. Rather than make you think about CRDTs explicitly, Electric syncs an in-browser sqlite db (WASM powered) with a central postgres instance. As a developer, you get local-first performance and real-time sync between users. And it's actually faster to ship an application without writing any APIs and just using the database directly. Only downside is Electric is immature and we often run into bugs, but as a startup we're willing to deal with it in exchange for shipping faster.

CMCDragonkai 2 years ago |

I see that the libraries are written in Rust, would this work in a nodejs app as a wasm or as native plugin?

anentropic 2 years ago | |

the docs show installing and using a wasm version from JS: https://www.loro.dev/docs/tutorial/get_started

moklick 2 years ago |

Looks great! We will check it out! And nice to see that you are using React Flow in your example

chris_st 2 years ago |

Curious how this compares with Automerge/Automerge-repo [0]. Looks like Automerge is at 2.0.

0: https://automerge.org/blog/2023/11/06/automerge-repo/

matharmin 2 years ago |

How does this compare to Yjs/y-crdt?

curtisblaine 2 years ago |

I couldn't find this in the docs, but is it easy / transport agnostic to sync two remote instances through the network? What about saving state on the server (so different devices can sync with each other without having to be online at the same time?)

czx111331 2 years ago | |

Yes, it's agnostic to the network layers. The output of doc.exportFrom() is just a binary array.

Inviz 2 years ago |

Well it's honestly about time. I've tried to build something like this personally with OTs, but it can be pretty brutal with all the fuzzying and N-way merges. I even chose one of rich editors just because it supports OT (then i learned it's only in commercial version not even available for small-timers).

I like the completeness of the Loro solution: the state, the rich text, the tree. Local-first database approach sounds like a great idea. Wondering how large is the code size overhead for using this though.

meiraleal 2 years ago |

great post explaining CRDT and the tool.

bxff 2 years ago |

Congratulations on the launch! Cannot wait to see Loro in action.