Distributed systems are networks comprised of nodes that interact directly with each other - no central server required. These nodes can store and edit files offline and in geographically diverse locations, and each node accessing the file has a copy. When it's time to bring the changes together and synchronize the file copies, there needs to be a mechanism in place that handles this efficiently and accurately. Conflict-free Replicated Data Types (CRDTs) are an excellent solution.
Understanding CRDTs
CRDTs were first introduced in the paper Conflict-Free Replicated Data Types by Nuno Preguiça, Carlos Baquero, and Marc Shapiro. The authors define a CRDT as an abstract data type that can modify any replica without coordinating with another replica. When two replicas have received the same set of updates, they reach the same state deterministically by adopting mathematically sound rules to guarantee state convergence.
There are two types of CRDTs: operation-based and state-based. Operation-based CRDTs, or commutative replicated data types (CmRDTs), change state by transmitting only the update operation. State-based CRDTs, or convergent replicated data types (CvRDTs), send their full local state to the other replicas, and a function merges the state. Operation-based CRDTs are not necessarily idempotent, but state-based CRDTs are.
CRDTs ensure eventual consistency without centralized coordination by linking back to previous versions of the file. As state changes occur, more branches are created and merged in the data structure.
But what if two valid writes on the same item must be consolidated? How do we determine which is the "correct one"? This depends on how each application developer chooses to handle merges. Some may employ LWW (Last Writer Wins) and use timestamps to determine which user was the last to submit changes and use that user's edits. Others may use a vector clock to look at causal relationships between the edits and algorithmically choose the accepted edit.
Real-World Applications
There are many exciting use cases for CRDTs. Collaborative text editing and collaborative design tools use CRDTs to handle merge conflicts and work offline without tieing themselves to the server-client model, which can increase latency and costs. Fission built Webnative File System (WNFS) as a local-first file system for decentralized apps and uses CRDTs to enable concurrent writes with version control.
Another example is distributed databases. Fission's RhizomeDB is a local-first edge database for querying decentralized data, and it uses CRDTs as its framework. The CRDTs can make the database idempotent, depending on what the user needs to do. For example, if a user updates their address, the system will register that change. If an admin also attempts to update the user's address, no change will occur, and a duplicate entry won't be created. However, if the user wants to submit a form or ticket, ideally, a new one will be created each time because it may be for different reasons. CRDTs enable both options without a central server.
Benefits and Limitations
There are several advantages of CRDTs over traditional synchronization methods. As we've already touched on, CRDTs don't require centralized control and can work offline and asynchronously. This helps improve scalability.
CRDTs are designed to be fault-tolerant. They can continue to function correctly even when nodes are temporarily disconnected from each other, and they can automatically converge to a consistent state once the network is restored.
CRDTs also provide eventual consistency, ensuring that all replicas will eventually converge to the same state. This allows for flexibility and responsiveness while maintaining data integrity.
But it's important to note that CRDTs can also have some limitations. CRDTs may require additional metadata or information to track the causality of updates and resolve conflicts. This can lead to increased storage overhead compared to traditional synchronization methods, especially for large datasets.
There is also a steeper learning curve for developers using more traditional synchronization methods. That, combined with the expectation of implementing strong encryption, can be a big project to undertake.
CRDTs are a game-changer for distributed systems and local-first applications and databases.
If you'd like to learn more about CRDTs, we encourage you to check out the following papers:
Conflict-Free Replicated Data Types