An Advent CAPlendar.
Published 01 December 2025 at Yours, Kewbish. 3,189 words. Subscribe via RSS.
Introduction
For a couple years in high school, I dedicated my Decembers to Wastl’s Advent of Code, a daily series of Christmas-themed programming puzzles. The regular cadence of problems kept the completionist in me hooked, and the problems were satisfyingly challenging beyond my ability. I lacked the background to make it through most days after the first week or two on my own, so I’d sheepishly poke around the subreddit the next morning for hints. Through AOC, I learned a lot of Python builtins like counters, zips, and list comprehensions, and it was always a fun activity to look forward to during December.
Last year, I found out about a similar project called Advent of Distributed Systems. AODS breaks up the Fly.io Distributed Systems challenges into weekly miniprojects, aiming to get you through to the final Raft KV store by the end of the month.
This fall, I realized I had a growing paper backlog and a bucket list of startups and technologies I’d like to get around to reading someday, but never made the time for. Inspired by the subject matter of AODS and the cadence of AOC, I’m putting together my own holiday countdown.
This post is my 2025 Advent CAPlendar, exploring topics in distributed systems (hence the title pun), databases, formal verification, and security. I’m planning to read some foundational papers like DynamoDB and Zookeeper as well as some newer works out of MIT PDOS and Berkeley SkyLab. There are also some neat systems startups and protocols that I keep hearing about, so I’m also planning to glean what I can about how they work from their documentation. You can see a calendar of topics below1.
2025 Advent CAPlendar
I’m also trying something new and starting a popup newsletter to accompany this series. Signing up will get you a short weekend roundup list of the posts that week. To assuage the privacy-conscious: I’ll only use this for Advent CAPlendar-related posts, I’ll be running this by hand (no external data processors, etc.), and I promise I’ll be too busy studying for finals to think about spamming you otherwise!
1. The CAP and CAL Theorems
I figured it’d be most fitting to start this calendar by referencing the titular pun itself. The CAP theorem is idea that a distributed system can only satisfy two of the following three guarantees:
- Consistency: reads return the most recently written value. This behaviour is also known as the linearizability consistency level.
- Availability: every request by a non-failing node receives a response. Note that this doesn’t mean every request receives a response, which I think is the typical definition of uptime via high availability; it’s predicated here on being received by a non-failing node.
- Partition tolerance: the system keeps behaving as-is even as arbitrary nodes fail.
The idea is that if you have a network partition, you can decide to either eschew consistency and just return responses, even if they’re not the latest values, or stick to your guns and not respond to requests, sacrificing availability.
The CAP theorem got picked on a bunch. The most obvious argument is that the pithy “pick 2 out of 3” framing conveniently sidesteps that you can’t ignore partition tolerance, or that you might end up with less than two properties. As well, if your system doesn’t fit the definitions of consistency (e.g. isn’t linearizable), the CAP theorem shrugs its shoulders and gives up — it won’t really tell you anything. Martin Kleppmann’s paper also highlights some vagueness in definitions in the original CAP proof. This is summarized in Dominik Tornow’s blog post, which illustrates how the conjecture and proof are a bit like passing ships in the night given their definitions are a bit off.
The CAP theorem could be a good rule of thumb, but I think its variants that focus on latency instead are far more interesting. PACELC is an extension of CAP, stating that under Partitioning, you have to choose between Availability and Consistency, Else you’ll have to make the tradeoff between Latency and Consistency. This makes sense, since in the absence of partitions stronger consistency models will still impose more acknowledgements and latency.
Kleppmann’s paper goes further, modelling availability in terms of operation latency to match more industry-standard intuitions of uptime SLAs. “If a service can sustain low operation latency, even as network delay increases dramatically, it is more tolerant of network problems than a service whose latency increases.” sounds easier to compare to real systems. He also defines some operation latency lower bounds in term of network latency at different consistency levels which clearly match intuition.
Almost a decade later, Edward Lee takes this even further, defining the CAL theorem. This trades off against Consistency, Availability, and network Latency, but at a more gradual level than the all-or-nothing assumptions of the CAP theorem. The CAL theorem subsumes the CAP theorem, which is defined as a special case. It’s a pity this isn’t more widely known, otherwise I’d have a cleaner title for this Advent series.
I think the neatest part of Lee’s paper is the Lingua Franca framework he introduces. It’s a coordination language for distributed, realtime software for cyber-physical systems like autonomous vehicles. The key feature I picked up on was the fault handlers that trigger when you hit some apparent latency bounds, in which you can specify callbacks and explicitly define whether you’d like to relax your consistency or availability properties. If you choose to sacrifice availability, it provides centralized coordination primitives; if not it provides decentralized coordination with error bounds on clock synchronization to relax consistency. Lee has a video tutorial for how to do so here. It looks like a nice abstraction layer to keep the CAL theorem top of mind while developing a distributed system.
I think the most valuable part of the CAP/CAL theorems for me is the quick gut check mental model. The CAP theorem appeals to instinctive sense. I also had the opportunity to attend Lee’s Distinguished Lecture at UBC, and it was just as clearly laid out and intuitive. Writing this up has highlighted some of the caveats of each model, though, which I’ll need to remember to keep in mind.
2. I’m FLPping Out
The CAP theorem is one of the classic impossibility results in distributed systems, and back at DARE last year I learned about another: the FLP. Unlike the CAP theorem, FLP’s initialism doesn’t serve as a neat mnemonic: it stands for Fischer, Lynch, and Paterson, each very storied researchers that I’d recommend checking out if you have some time.
The FLP states that every asynchronous consensus algorithm may not terminate if there’s at least one silently faulty process. Asynchronous in this case means that we can’t bound the amount of time it takes to pass messages, and the paper also assumes no access to synchronized clocks (this would make things easier with timeouts.) Consensus means that all non-faulty nodes decide on the same value, like 1 in the set {0, 1}, for instance. If you rotate the FLP on its head a bit, it shows that it’s impossible to determine if the system will never terminate and has failed or if latency’s just high.
They model consensus via one big message buffer system, shared by all processes and requiring explicit send/receive notifications. They define a configuration as the set of processes running, their internal states, and this message buffer. A reachable config is a config with some series of messages being applied on top. They outlaw protocols that always decide either 0 or 1, and define a concept of ‘partial correctness’ that means all configurations have exactly one decision value, which could be null, and some reachable configuration will eventually decide 0 or 1. I’ll hand-wave a bit below so look into the paper if you feel unconvinced.
The proof proceeds as follows:
- Lemma 1: Given two configurations that have disjoint sets of processes, you can replay the events of one onto the other and vice versa and you’ll end up with the same final configuration because the sets of processes doing work are entirely distinct.
- Lemma 2: Any such protocol has an initial configuration that can eventually lead to a configuration that’s decided 0 and a separate configuration that’s decided 1 — this is called an initial, bivalent config. Bivalent means we can’t decide on either 0 or 1, which is bad news.
- Lemma 3: From a bivalent config C, applying an event to any configs reachable from C will still result in at least one bivalent config being reachable. You can see where this is going now.
- Lemma 4: If we have to start with a bivalent config, and can only get to states where we still have some bivalent configs, we’ll never get into a config that only has univalent configs that have decided on other values. This means there can’t be such a protocol that always terminates.
The paper also includes an example protocol that does achieve asynchronous terminating consensus if we assume a fixed number of live processes and that processes can’t die and become faulty during consensus. This consensus protocol creates a clique of known live nodes, where each node uses information received from only the other clique members to deterministically decide. This does terminate but is significantly more constrained.
I like that in the conclusion they mention “These results do not show that such problems cannot be “solved” in practice; rather, they point up the need for more refined models of distributed computing […]”. It’s quite a poetic takeaway. Something similar arose with the extensions to the CAP theorem, and indeed I can find plenty of recent work still iterating on modelling the problem and defining the impossibility boundary: here, here, and here, just to name a few.
The FLP paper itself is very short and approachable, so I’d recommend giving it a direct read! The definitions aren’t too onerous to wade through and the contradiction structure follows quite intuitively. The paper is quite theoretical and technical, but I need to get better at understanding and eventually producing work like this to do more convincing research.
3. Consistency Canvases
In my first week of my internship at the Sorbonne, I was talking about consistency models with my mentor, who promptly whipped out an annotated, thick paper with crazy-looking diagrams illustrating the different consistency anomalies given certain models. He offhandedly said he could send it to me so I could give it a read later, but just the brief glimpse of the diagrams scared me off enough to never press the topic while I was in Paris. However, I was thinking about consistency again while brainstorming ideas for this calendar, and this certainly feels like a paper I’d put off, so I asked him if he still remembered which paper it was. I was uncertain how to feel when the PDF landed in my inbox, but I wanted to keep an open mind.
The terrifying behemoth in question is Cerone et al’s A Framework for Transactional Consistency Models with Atomic Visibility, which I was happy to learn was only 14 pages (not a behemoth at all!) It formally defines consistency models (read atomic, snapshot isolation, etc.) for transactional systems. Prior work on formalizing models was one-off and scattered, “often tied to database internals”. Cerone et al. present a more unified, general approach that can be used for higher-level reasoning about applications using said transactional systems.
They define six axioms:
- INT (internal consistency): Within a transaction, sequential semantics are respected, so reads observe the most recently written value, avoiding unrepeatable reads.
- EXT (external consistency): A transaction observes the maximum value determined by the visibility and arbitration relations, avoiding dirty reads.
- TRANSVIS (transitive visibility): If a transaction T1 is visible to another, T2, and T2 is itself visible to another, T3, T1 should also be visible to T3.
- NOCONFLICT (no concurrent updates): Two concurrent transactions may not both modify the same object.
- PREFIX (prefix consistency): All transactions become visible in the same order defined by arbitration order; if T1 is observed by T2, T2 also observes all of T1’s arbitration relation predecessors.
- TOTALVIS (total visibility/serializability): The visibility relation is totally ordered.
The six consistency models they define all satisfy some of these axioms. For example, read atomic, the weakest level, only satisfies INT (internal consistency) and EXT (external consistency); snapshot isolation, the strongest non-serializable model, satisfies PREFIX (arbitration order prefix) and NOCONFLICT (no concurrent modifications) as well. In order, the six models they define are read atomic, causal consistency, parallel snapshot isolation, prefix consistency, snapshot isolation, and serializability.
The diagrams I saw were the illustrations of anomalies that can occur if certain axioms don’t hold. These are neatly formatted on page 6, which alongside page 10 lay out the key ideas of the paper. For instance, if TOTALVIS (total order/serializability) is not satisfied, then write skew can occur. This means that modifying two different objects (e.g. withdrawing from two separate accounts) while expecting a shared condition to hold (e.g. total balance over some threshold) can violate the condition unless there’s some total order. These diagrams were much clearer than I’d thought — they use concrete examples like bank accounts or posting comments to make the anomalies more concrete.
Interestingly, their work also outlines some more implementation-driven algorithms that satisfy their framework’s models, “closer to the intuition of practitioners”. They prove how these operational specifications align with their axioms and properties. On a meta-level, they’re structured a bit like the ones in an ICDT submission we made for specifying database backends — straightforward and ready to be directly implemented. They define a set of constraints separate from the model axioms earlier in the paper, and present a similar mapping of consistency model to constraints needed. Each of the original axioms follows from some combination of these constraints: for example, the transitive visibility axiom follows from causal delivery because replicas will only receive a transaction’s logs after receiving its predecessors.
As I’d mentioned yesterday, I want to improve my appetite for theory-heavy work, so reading this paper was a good chance to exercise that skill again. It certainly wasn’t as intimidating as I thought it’d be. The consistency diagrams look complex at first glance but are fairly straightforward even with the formal lens. I’m itching to get back to some more applied research, though, which will be the focus of the next few days.
4. CRDTs Don’t Grow On Trees
On the topic of consistency, I wanted to read about some CRDTs: causal trees and replicated growable arrays. Both are very similar ideas for ensuring that concurrent edits to a doc (e.g. two friends working on a paper while both offline) are resolved in an orderly, meaningful way. Their basic variants operate on a per-character basis, but more advanced variants support editing ranges of text.
The main idea with causal trees is that each user is assigned an entity ID, and an idea similar to Lamport timestamps is used to track time. The start of the text is the root of the tree, and when a user adds a character, an edge is created from the character’s causal predecessor. If another user concurrently edits, the tree forks at the common causal predecessor and continues branching. A user’s timestamp, which is attached to each character added, increments on each character added and is the maximum timestamp of all merging branches. Intuitively, this means timestamps represent how much content a user has seen before they insert something else. I’d highly recommend checking out the visualizations on Farley’s post for an interactive look.
To merge branches, the head events are first sorted by decreasing timestamp (since this means the user saw more recent content as they were making their edit), then by decreasing entity ID (to deterministically create a total order). To recover the final text, we then take a depth-first pre-order traversal of the tree.
Deleting text uses a common CRDT strategy called tombstoning, where the inserted character is not actually removed from the tree, but just marked as a tombstone and skipped on final rendering. This is because if you hard-delete the node in the tree, no matter what design decision you take, you’ll end up with tradeoffs. Surely you can’t just delete the children of the node along with it, so this means you’ll have to reattach them somewhere. Reattaching to the root of the tree is easy and deterministic but doesn’t make semantic sense (characters would jump to the start of the text). Reattaching to the deleted node’s parent sounds like a good idea, but if there are concurrent operations, things might end up out of order across clients if the immediate ancestor hasn’t been received yet. Sort order would also change among the ancestor’s siblings. All this means it’s just easier to preserve the structure of the tree and skip the node later.
The original causal trees paper goes into more detail suggesting a data format based on Unicode to represent the timestamp numbers with special characters like ॐ representing the root of the tree. It gets pretty crazy: there’s a diagram for how each data format converts to others and what methods combine them. Said methods are implemented not in a typical programming language, but via PCRE regexes. The author makes a fair point for this: they assert that higher-level languages like JS would have too much overhead creating individual objects for each character due to GC, hence the need for a text-only format and portable regex manipulation. I enjoy a cheeky regex puzzle from time to time, so it was really neat seeing regexes be pushed here, but this was a bit of an arcane choice. There are also a lot of fiber-arts-related names for terms in the paper, which makes it a bit challenging to initially parse (in my head “weave” and “weft” occupy too similar of a semantic space).
There’s another very similar CRDT called a replicated growable array. Instead of a tree, though, it’s a flat list that’s easy to read back. Each edit is still tagged with a timestamp and entity ID. For concurrent edits, the insertion pointers are offset for each concurrent client. In practice, it’s faster due to simpler traversal and implementation, so RGAs end up being used in production libraries like Automerge and Yjs.
Thinking through why tombstones were necessary over raw deletions was a fun challenge, and I also liked coming to terms with the odd vocabulary of the causal types paper. Reading it first made understanding the RGA paper much easier, so I’d recommend this order. Neither paper has overwhelming amounts of formalism either, which I appreciated. I got surprisingly interested in these couple papers now that I’ve gone through the basic theory of CRDTs a few times, so I’m looking forward to continuing this trend a bit in tomorrow’s writeup.
-
After spending a while doing infra work, it was refreshing to go back to my frontend roots and put this together! I learned some new tricks: did you know CSS has counters? Did you know you can set a separate header ID or that you can have slashes in them? ↩︎