Pillar: optimistic-speculative-execution | Date: May 2026
Scope: Alternatives to pessimistic locking for parallel agent coordination: optimistic concurrency control applied to code edits (read freely, validate at commit, rollback on conflict), speculative parallelism from compiler and hardware literature, conflict probability modeling to choose OCC vs pessimistic locking, CRDT-style mergeable data structures for code, software transactional memory (STM) analogs, and when empirical conflict rates make speculative approaches strictly better.
Sources: 10 gathered, consolidated, synthesized.
Critical finding: The choice between optimistic and pessimistic concurrency is not philosophical — it reduces to a single break-even formula: when (conflict_rate × retry_cost) < lock_management_overhead, optimistic concurrency control wins unconditionally. For parallel coding agents working on well-decomposed non-overlapping tasks, conflict rates approach zero, making OCC strictly superior to locking in nearly all realistic configurations.[10][5]
Optimistic Concurrency Control was formalized in 1979 by H.T. Kung and John T. Robinson as a four-phase protocol — begin, modify, validate, commit/rollback — that acquires no locks during execution and defers all conflict detection to commit time.[1][6] Its central bet is that most transactions complete without interference; the lock-acquisition overhead paid on every pessimistic transaction is simply wasted cost when conflicts are rare. This bet is validated by ubiquitous adoption across Git, Redis, CouchDB, DynamoDB, Kubernetes, HTTP ETags, Oracle, PostgreSQL, and MySQL InnoDB — each implementing OCC or its MVCC generalization as the default read/write strategy. PostgreSQL's MVCC guarantee — "readers don't block writers and writers don't block readers" — directly enables the high-concurrency workloads that pessimistic locking cannot achieve without sacrificing throughput.[5]
The decision factor matrix makes the OCC vs. pessimistic boundary concrete across 7 independent variables: conflict frequency, workload type (read-heavy vs. write-heavy), retry cost, transaction duration, consistency requirements, cross-request transaction needs, and deadlock risk.[10] One factor is structurally decisive for multi-agent systems: pessimistic locks cannot be held across HTTP requests or long-duration sessions because they are connection-level mechanisms. This makes OCC the only viable approach for parallel AI agent workflows where each agent session may span minutes to hours of elapsed time.[5] The conflict rate decision matrix consolidates to four cases: low conflict rate → optimistic regardless of retry cost; high conflict rate → pessimistic regardless of retry cost; moderate conflict rate → split on retry cost magnitude.[5][10]
Thread Level Speculation (TLS) — also called Speculative Multithreading — extends OCC principles from database transactions to program execution itself: sequential code sections are parallelized speculatively at runtime, with dependency violations triggering squash-and-rerun.[2][7] The theoretical ceiling is rollback freedom, a safety condition defined at Princeton (PLDI 2010) where static analysis proves that any speculative execution is equivalent to a non-speculative one — eliminating logging, conflict detection, and rollback overhead entirely.[2] However, the practical ceiling is lower: prior TLS compilers and architectures have been "proven highly profitable only in limited cases and achieved little speedup on many real-world applications."[2][7] High misspeculation rates eliminate all gains; each squash discards completed work. MIT's T4 system (ISCA 2020) addressed this by dividing entire programs into fine-grained tasks and eliminating the global synchronization barriers that caused earlier systems to stall.[2]
Software Transactional Memory (STM) applies OCC semantics to in-memory data structures rather than disk-based files or databases — a thread executes reads and writes freely while maintaining a transaction log, then validates atomically at commit or rolls back and retries.
Conflict-Free Replicated Data Types (CRDTs), formally defined in 2011 by Shapiro, Preguiça, Baquero, and Zawirski, represent the most aggressive speculative stance: they mathematically guarantee conflict-free merging by requiring merge functions to satisfy commutativity, associativity, and idempotency — making concurrent updates automatically composable without coordination, validation, or rollback.[8][3] Production deployments validate the approach at scale: League of Legends used CRDT-based Riak to handle 7.5 million concurrent users at 11,000 messages/second; Apple Notes uses CRDTs for offline device synchronization; Redis Enterprise deploys CRDTs for multi-master geographic replication.[8] The Zed code editor (from creators of Atom and Tree-sitter) implements CRDT-based multiplayer editing natively in Rust. In sequence CRDT benchmarks, Yjs is measurably faster than Automerge for text operations due to its plainlist representation versus Automerge's tree structure.[8]
CRDTs carry a hard limit that constrains their use for agent-based code editing: automatic merges at the character or line level can produce programs that compile but are semantically broken.[3] This is not a pathological edge case — it is the default outcome when two agents make logically dependent changes to different files that a CRDT merges independently. CRDT merge rules are hardcoded by algebraic structure, not by program semantics. A CRDT counter cannot model a bank balance because the merge could permit negative values; similarly, a CRDT text merge cannot detect that agent A renamed a function while agent B added a caller using the old name.[3] CRDTs are strictly better than OCC only when automatic merge results are always acceptable — a condition that holds for fine-grained character-level collaborative editing (Zed, Yjs integrations with CodeMirror and Monaco) but rarely holds for multi-file semantic changes.[3]
One implementation hazard cuts across all four optimistic approaches: the validation and commit phases in OCC and STM must be performed as a single atomic operation to prevent a time-of-check to time-of-use (TOCTOU) race condition that can silently corrupt results even in correctly structured implementations.[6] In STM, an analogous hazard exists for inconsistent reads: an optimistically reading transaction may observe mixed old and new values from a concurrent partial write, potentially causing segmentation faults or infinite loops before the abort is detected — a correctness issue even though the transaction will ultimately roll back.[4] Both hazards require explicit architectural attention rather than falling out of the optimistic model automatically.
Practitioners building parallel agent coordination should apply the same hybrid strategy that production databases converge on: MVCC (optimistic) as the default for general agent work across disjoint file regions, with selective pessimistic locks applied only to known hot-spot resources like shared configuration files or build manifests, and CRDT-style merging at fine granularity where code semantics permit automatic resolution.[5][10][3] Task decomposition quality is the primary lever: well-decomposed non-overlapping agent tasks drive conflict rates toward zero, making the OCC bet structurally correct without requiring measurement. Where agent tasks do overlap on shared infrastructure, apply pessimistic locking selectively to those specific resources rather than imposing global sequential ordering. For in-memory agent state coordination (shared queues, status registries), STM provides deadlock-free composability at a 2× overhead cost that is acceptable when I/O can be buffered post-commit. Rollback-free task decomposition — statically provable non-overlap — remains the theoretical ideal, eliminating conflict detection overhead entirely at the cost of stricter upfront planning.
Optimistic Concurrency Control (OCC), also known as optimistic locking, is a non-locking concurrency control method for transactional systems first proposed in 1979 by H.T. Kung and John T. Robinson.[1][6] Its central assumption is that multiple transactions can frequently complete without interfering with each other — so they proceed without acquiring locks, deferring conflict detection until commit time. Transactions use data resources without acquiring locks; before committing, each verifies no other transaction has modified data it read; if conflicting modifications are found, the transaction rolls back and may be restarted.[6]
| Phase | Action | Notes |
|---|---|---|
| 1. Begin | Record a timestamp marking transaction start | Establishes the read baseline[1][6] |
| 2. Modify | Read data values and tentatively write changes without locking | No blocking of other transactions[1] |
| 3. Validate | Check whether other transactions modified data this transaction used | Covers transactions completed after start time; optionally covers still-active transactions[1][6] |
| 4. Commit/Rollback | No conflict → commit; conflict → abort and optionally retry | Validation + commit must be atomic (TOCTOU hazard)[6] |
Key finding: The validation and commit phases must be performed as a single atomic operation to avoid a time-of-check to time-of-use (TOCTOU) race condition — a subtle implementation hazard that can silently corrupt results even in systems that correctly implement the four-phase lifecycle.[6]
| Strategy | Direction | What It Checks |
|---|---|---|
| Backward validation | Looks backward in time | Conflicts with already-committed transactions[1] |
| Forward validation | Looks forward | Conflicts with currently active (uncommitted) transactions[1] |
| Contention Level | OCC Behavior | Source |
|---|---|---|
| Low | Completes without lock management overhead; superior throughput | [1][6] |
| High | "The cost of repeatedly restarting transactions hurts performance significantly" | [1] |
Lock-based approaches also carry their own failure mode: "locking can drastically limit effective concurrency even when deadlocks are avoided."[1][6] OCC trades retry cost (paid only when conflicts occur) against locking cost (paid on every transaction regardless) — optimal choice depends entirely on empirical conflict probability.[1]
| System | OCC Mechanism | Domain |
|---|---|---|
| Git | Edit locally, detect conflicts at merge time | Version control[1][6] |
| MediaWiki (Wikipedia) | Page edit OCC | Wiki software[1] |
| Bugzilla | OCC; conflicts called "mid-air collisions" | Bug tracking[1] |
| Redis, CouchDB | OCC model | Databases[1] |
| Elasticsearch | Sequence numbers for document versions | Search[1] |
| DynamoDB | Conditional writes/versioning ("Conditional Updating") | Cloud NoSQL[1] |
| Kubernetes | OCC on resource objects | Orchestration[1] |
| HTTP ETags | Conditional GET/PUT with If-Match; mismatch → 412 Precondition Failed | HTTP protocol[6] |
| Apache Solr | _version_ field | Search[1] |
| Microsoft Entity Framework | Built-in OCC via binary timestamp value | ORM[6] |
| Google App Engine | OCC model | Cloud platform[1] |
| Ruby on Rails / Grails | Built-in OCC support | Web frameworks[1] |
| Oracle, PostgreSQL, MySQL InnoDB | MVCC (readers don't block writers; writers don't block readers) | RDBMS[5] |
Git is the canonical OCC system for source code: users edit their local copy freely, with conflicts detected only at merge time.[1][6] Because Git is distributed, no single user could hold a file lock during editing — concurrent edits are unavoidable by design. When conflicts occur (lines changed by two people simultaneously), the submitter of the most recent update resolves the conflict.[1] Most revision control systems — SVN, Git, Mercurial — use this "merge model," which is structurally identical to OCC: read freely, write locally, detect conflicts only at merge/commit time.[1]
ETags apply OCC principles at the HTTP layer for code asset management:[6]
| Step | Action | OCC Phase Equivalent |
|---|---|---|
| 1 | Client stores ETag header from GET response | Begin (record timestamp/version) |
| 2 | Client modifies resource locally | Modify |
| 3 | Client sends conditional PUT with If-Match: <etag> | Validate |
| 4a | Tag matches → server accepts write | Commit |
| 4b | Tag mismatch → server returns 412 Precondition Failed | Rollback |
Critical implementation constraint: tag validation and data write must be performed as a single atomic operation to avoid race conditions.[6]
In collaborative document/code editing applications where multiple users edit simultaneously, the application tracks document versions; users make edits independently; the system detects conflicts by comparing versions; on conflict, a resolution mechanism is triggered.[6]
Key finding: OCC is a good fit for code coordination when low data contention exists — when it is unlikely that multiple agents compete for the same resource simultaneously — and when conflicts are rare enough that the cost of repeating failed requests is acceptable.[6]See also: Post-Merge Agent Recovery (post-conflict resolution mechanics)
| Approach | Mechanism | Conflict Handling |
|---|---|---|
| Pessimistic Locking | Acquires shared lock on read; blocks other transactions from modifying record until released | Prevents conflicts preemptively[5][10] |
| Optimistic Locking (OCC) | No lock on read; detects conflict at write time via version column in WHERE clause; zero rows updated → OptimisticLockException |
Detects and retries[5][10] |
"Since logical clocks are superior to physical clocks when implementing a concurrency control mechanism, a version column is used to capture the read-time row snapshot information. The version column is incremented every time an UPDATE or DELETE statement is executed."[10]
| Locking Strategy | Networking Equivalent | Protocol | Behavior |
|---|---|---|---|
| Pessimistic | Wi-Fi | CSMA/CA — Collision Avoidance | Blocks preemptively before transmitting[5] |
| Optimistic | Ethernet | CSMA/CD — Collision Detection | Transmits optimistically; detects and retries on collision[5] |
| Factor | Choose Pessimistic | Choose Optimistic |
|---|---|---|
| Conflict frequency | High | Low/Rare[10] |
| Workload type | Write-heavy | Read-heavy[10] |
| Retry cost | Very high | Low[10] |
| Transaction duration | Short (locks released quickly) | Long/multi-step[10] |
| Consistency requirements | Strict (financial, compliance) | Moderate (web apps)[10] |
| Cross-request transactions | No | Yes (pessimistic locks can't span HTTP requests)[5] |
| Deadlock risk | Present | None[5] |
| Database System | MVCC Implementation | Key Property |
|---|---|---|
| Oracle | MVCC based on OCC | Readers don't block writers[5] |
| PostgreSQL | MVCC | "Readers don't block writers and writers don't block readers"[5] |
| MySQL InnoDB | MVCC | Versioned row access[5] |
| CockroachDB, TiDB, YugabyteDB | MVCC-based | Distributed OCC[5] |
OCC is uniquely suited for long-running conversations — a pattern directly analogous to parallel AI agent workflows:[5]
This pattern maps directly to parallel AI agent workflows — see Section 8 for the full mapping.
Pessimistic locking cannot be held across HTTP requests (connection-level locks), making OCC the only viable approach for multi-step, long-duration workflows.[5]
Many production systems blend both strategies:[10]
SELECT ... FOR UPDATE) on known hot-spot rowsKey finding: The fundamental question is how likely conflicts are — the decision factor matrix above captures the key variables across conflict frequency, workload type, retry cost, and consistency requirements. Quantitative break-even analysis is in Section 7.[5][10]See also: Concurrency Control Theory (OCC theoretical foundations); Deadlock, Starvation, and Fairness (lock-related failure modes)
As processor clock-rate gains stalled in the early 2000s, the computer industry shifted to multi-threaded and multi-core architectures.[7] TLS emerged as a way to exploit these parallel cores even for sequential programs without obvious parallelism. Thread Level Speculation (TLS) — also called Speculative Multithreading or Speculative Parallelization — is a dynamic (runtime) parallelization technique that speculatively executes code sections anticipated for later execution in parallel with normal execution on separate threads.[2][7] TLS uncovers parallelism that static (compile-time) techniques fail to exploit because at compile time thread independence cannot be guaranteed — but is frequently true at runtime.[2] It provides a path to parallelize sequential programs without complex data dependence analysis or explicit synchronization.
| Strategy | Description | Practicality |
|---|---|---|
| Eager execution | Computes all paths simultaneously | Usually impractical — too many paths[7] |
| Predictive execution (TLS) | Runs most likely path in parallel; rolls back on conflict | Most common pattern; production-viable[7] |
| Strategy | Scope of Rollback | Conservatism |
|---|---|---|
| Full restart | Stop entire parallel execution; restart completely | Most conservative[2][7] |
| Partial restart | Stop and restart offending threads and their successors | Moderate[2][7] |
| Targeted rollback | Stop only the offending thread and downstream consumers | Most fine-grained[2][7] |
| Approach | Who Decides Parallelism | Mechanism | Cost |
|---|---|---|---|
| Hardware (dynamic) | Processor at runtime | Cache coherence; version management in caches; detects store/load conflicts across threads[7] | Fast but expensive; requires processor/memory modifications |
| Software (static) | Compiler at compile time | Plans parallel instructions ahead of time | Fails on complex control flow, pointer-heavy code[7] |
| Hybrid (TLS) | Compiler + hardware runtime | Compiler parallelizes loops ignoring ambiguous dependences; hardware detects/enforces at runtime[7] | Best general-purpose approach |
Current parallelizing compilers can only tackle applications with regular access patterns on arrays or affine indices, and data dependencies expressible in linear form.[2] TLS overcomes this for:
| System | Origin | Key Innovation |
|---|---|---|
| MiniTLS | First software TLS system | Eager memory data management (in-place updates); novel compact version management for parallelized rollback; recovers faster than existing software eager TLS systems[2] |
| Lector (Lazy inspECTOR) | Academic research | Lightweight inspector thread runs alongside each speculative thread; proactively verifies data dependencies before misspeculation occurs; avoids reactive squashing[2] |
| Rollback-free (Princeton PLDI 2010) | Princeton University | Defines safety condition (rollback freedom): any speculative execution is equivalent to non-speculative execution without logging, conflict detection, or rollback — statically proven via static analysis[2] |
| T4 (MIT ISCA 2020) | MIT | Divides entire programs into tiny tasks, exposing speculative parallelism from first instruction; addresses prior work's "unselective aborts that wastefully discard work" and global synchronization that stalls all later tasks[2] |
Key finding: Rollback freedom (Princeton PLDI 2010) represents the theoretical ideal of speculative execution: a static analysis safety condition where any speculative execution is equivalent to a non-speculative one, eliminating all runtime logging, conflict detection, and rollback overhead entirely.[2]See also: Deadlock, Starvation, and Fairness (lock-specific failure modes in non-speculative approaches)
Software Transactional Memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing — an alternative to lock-based synchronization implemented entirely in software. STM is "a technique for simplifying concurrent programming by allowing multiple state-changing operations to be grouped together and performed as a single atomic operation."[4][9]
STM is typically very optimistic: a thread completes modifications to shared memory without regard for what other threads might be doing, recording every read and write in a log. The onus is placed on the reader (not the writer) to verify consistency after completing a transaction.[4]
| Phase | Action |
|---|---|
| Read Phase | Thread executes reads and writes while maintaining a transaction log of all memory accesses[4][9] |
| Commit Phase | Validates whether other threads modified memory accessed during this transaction; success → permanent commit; conflict → abort with full rollback[4][9] |
| Abort/Retry | Conflicting transaction rolls back and re-executes from beginning until successful[4][9] |
| Mechanism | When Locks Acquired | Key Property | Tradeoff |
|---|---|---|---|
| Encounter-time locking (eager) | During write operations | Values logged in undo log | Earlier conflict detection but may block more[4] |
| Commit-time locking with version clock (TL2) | Commit phase only | "Invisible reads" — reads acquire no locks; global version clock; location version > read-version → immediate abort; avoids quadratic overhead of incremental validation | Lower overhead but conflicts detected later[4] |
"Lock-based programs do not compose: correct fragments may fail when combined." STM enables atomic composition of operations through transaction wrapping.[4]
2005 milestone: Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy described an STM system built on Concurrent Haskell that enables arbitrary atomic operations to be composed into larger atomic operations — a useful concept impossible with lock-based programming.[9]
| Benefit | Description |
|---|---|
| Composability and Modularity | Write concurrent abstractions easily composed with any other STM abstraction without exposing synchronization details[9] |
| Freedom from deadlock | No lock ordering concerns — code contains no locks[9] |
| No lost wakeups | No condition variables means no missed notifications[9] |
| Increased concurrency | Different threads simultaneously modify disjoint parts of a data structure protected by the same lock under pessimistic approaches[9] |
| Dimension | STM Behavior |
|---|---|
| 1–4 processors | Typically performs "no worse than twice as slow" vs. lock-based — due to logging and commit overhead[4][9] |
| Theoretical complexity | O(n) worst-case space and time for n concurrent transactions[4] |
| Starvation risk | Lengthy transactions continuously competing with multiple short transactions can starve[4][9] |
| Design goal | "STM aims to provide the ease of coarse-grained locking with the efficiency of fine-grained locking"[4][9] |
Optimistic reading creates a subtle hazard: incomplete transactions may read inconsistent state — mixing old and new values from partial writes by other threads — potentially causing segmentation faults or infinite loops before abort detection occurs. This is a correctness issue even though the transaction will ultimately abort.[4]
Any operation in a transaction must be idempotent since transactions may retry. Irreversible operations (I/O, network calls) cannot be performed inside transactions. Solution: create buffers that queue up irreversible operations and execute them after the transaction successfully commits.[4]
| Pattern | Description |
|---|---|
| IO After a Transaction | Run IO immediately after atomically completes[9] |
| Retry-based blocking | Use retry to wait for a condition without busy-waiting[9] |
Alternative actions with orElse | Try one transaction, fall back to another on retry[9] |
| Composable atomic blocks | Combine multiple STM operations into larger atomic operations[9] |
| Language | Implementation | Notes |
|---|---|---|
| Haskell (GHC) | Built-in STM monad; first implemented in GHC 6.4 | Type system enforces transactional safety; primitives: atomically, TVar, retry, orElse, throwSTM, catchSTM[9] |
| Clojure | Built-in, first-class Refs | Validators (per-ref constraint functions), Watchers (change notifications), eClojure extension adds after-commit/on-abort/on-commit + Haskell-inspired retry/orElse[9] |
| Scala | ZIO STM | Functional effect system integration[4][9] |
| Kotlin | arrow-fx-stm | Arrow functional library STM[4][9] |
| Rust | async-stm | Async-aware STM[4][9] |
| OCaml | Kcas | [4][9] |
| Go | Kashmir | [4][9] |
| C, C++, C#, Java, Python | Various libraries | Without language-level mutability constraints, developer must self-enforce transactional discipline[9] |
| Approach | Conflict Detection | Overhead (no conflict) | Overhead (conflict) | Composable |
|---|---|---|---|---|
| Pessimistic locking | Never (blocked preemptively) | Lock acquire/release | Thread waiting | No[4] |
| OCC | At commit | Minimal | Abort + retry | No[4] |
| STM | At commit | Log maintenance | Abort + retry | Yes[4] |
Key finding: STM is essentially OCC applied to in-memory data structures, with the addition of composability and automatic retry semantics — but practical adoption has been limited by performance overhead (up to 2× slower on 1–4 processors), inability to perform I/O inside transactions, and difficulty integrating with legacy code.[4]
A Conflict-free Replicated Data Type (CRDT) is a data structure replicated across multiple computers enabling independent, concurrent updates without coordination. The system automatically resolves inconsistencies, guaranteeing eventual convergence among replicas despite temporary divergence. Formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski — initially motivated by collaborative text editing and mobile computing.[8] CRDTs "do not assume the use of a single server," enabling true peer-to-peer networks — contrasting with Google Docs, Trello, and Figma, which require all communication to route through a central server.[3]
State-based CRDTs require merge functions satisfying three algebraic properties:[3][8]
| Property | Formula | Effect |
|---|---|---|
| Commutativity | a ⊔ b = b ⊔ a | Order of receipt doesn't matter |
| Associativity | (a ⊔ b) ⊔ c = a ⊔ (b ⊔ c) | Grouping of merges doesn't matter |
| Idempotency | a ⊔ a = a | Duplicate messages are harmless; applying the same update twice yields the same result |
These properties ensure CRDTs remain "invariant under package re-ordering and duplication." The merge function computes the join for any pair of replica states, forming a semilattice where merge = least upper bound. Update functions must be monotone regarding the partial order.[3][8]
| Type | What Is Transmitted | Advantage | Disadvantage |
|---|---|---|---|
| State-Based (CvRDT) | Full local state to all replicas on every update | Simple; compatible with gossip protocols | "The entire state of every CRDT must be transmitted eventually to every other replica, which may be costly" — high bandwidth[3][8] |
| Operation-Based (CmRDT) | Update operations (deltas) directly to replicas | Lower bandwidth; smaller payloads | Requires middleware guarantees: operations not dropped/duplicated, delivered in causal order[3][8] |
| Category | Algorithm | Semantics |
|---|---|---|
| Counters | G-Counter | Grow-only; each node maintains its own slot; merge uses maximum values[8] |
| PN-Counter | Combines two G-Counters (positive + negative) for increment/decrement[8] | |
| Sets | G-Set | Additions only; merge is union[8] |
| 2P-Set | Adds tombstone set for removals; "remove-wins" semantics[8] | |
| LWW-Element-Set | Timestamps; allows reinsertion after removal via timestamp comparison[8] | |
| OR-Set | Unique tags instead of timestamps; removal requires matching all add-tags[8] | |
| Sequences (Text/Code) | WOOT | Without Operational Transformation[8] |
| Treedoc, RGA | Replicated Growable Array[8] | |
| Logoot, LSEQ, LogootSplit | Position-indexed sequences[8] | |
| Yjs | Plainlist instead of tree — industrial optimization; notably faster than Automerge for text operations[8] | |
| Automerge | JSON-like CRDT; Rust core with WASM bindings[3][8] | |
| Loro | Rust + JS; based on Replayable Event Graph; supports rich text, list, map, movable tree[3] |
| Tool | CRDT Technology | Editor Integration |
|---|---|---|
| Zed | CRDT-based multiplayer; open source; written in Rust; from creators of Atom and Tree-sitter | Native code editor[3][8] |
| Yjs | High-performance CRDT framework; modular | CodeMirror, Monaco, Quill, ProseMirror[3][8] |
| Automerge | JSON-like CRDT; Rust + WebAssembly | General collaborative apps[3][8] |
| Teletype (Atom) | CRDT-based real-time collaboration | Atom editor[8] |
| Conclave | LSEQ CRDT | Peer-to-peer research editor[8] |
| crdt.el | CRDT | Emacs[8] |
| Loro | Replayable Event Graph; rich text, list, map, movable tree | General collaborative apps (Rust+JS)[3] |
| Organization | CRDT Implementation | Scale / Use Case |
|---|---|---|
| League of Legends | Riak CRDT | 7.5 million concurrent users, 11,000 messages/second[8] |
| SoundCloud | LWW-element-set CRDT (Roshi) | Event streaming[8] |
| Apple Notes | CRDTs | Syncing offline edits between multiple devices[8] |
| Microsoft Fluid Framework | Server reference implementations + client SDKs | Collaborative applications platform[8] |
| Figma | Server-authoritative per-property last-writer-wins | Collaborative design[8] |
| Redis Enterprise | CRDTs | Multi-master replication across geographic datacenters[3][8] |
| Microsoft Azure CosmosDB | CRDTs or custom merge procedures | Global multi-region writes[3][8] |
| Riak (2013) | Native CRDT support | One of first databases with first-class CRDT support[3][8] |
| AntidoteDB | CRDTs + highly available transactions | Geo-replicated database[3][8] |
Industrial sequence CRDTs, including open-source ones, are known to out-perform academic implementations due to optimizations and a more realistic testing methodology.[8] In benchmarks, Yjs is notably faster than Automerge for text operations, reflecting the advantages of its plainlist representation over tree-based approaches.[8]
| Approach | Conflict Handling | Coordination Required | Rollback | Central Server |
|---|---|---|---|---|
| Pessimistic Locking | Prevent preemptively | Upfront lock acquisition | None | Yes[3] |
| OCC (Git model) | Detect + manual resolve | Validate at commit | Yes, on conflict | Optional[1] |
| CRDT | Mathematically impossible | None | None | No[3] |
CRDTs are "a bit like version control with Git, but without having to resolve merge conflicts manually" — the CRDT provides built-in conflict resolution.[8] Unlike Git's OCC model, CRDTs mathematically guarantee automatic conflict resolution at all times. Unlike operational transformation, CRDTs support decentralized peer-to-peer operation without a central server.[8]
Key finding: CRDTs are strictly better than OCC when merge semantics can be defined mathematically and automatic merge results are always acceptable — but for source code coordination, this condition rarely holds for complex multi-file changes, where automatic character-level merges can produce programs that compile but are semantically broken.[3]See also: Post-Merge Agent Recovery (handling semantically incorrect merges)
The optimal choice between optimistic (speculative) and pessimistic (locking) approaches depends entirely on empirical conflict probability in the actual workload.[1][5][10] This is not a theoretical question — it requires measuring actual conflict rates in production or simulation.
The decision boundary:[10]
| Condition | Winner |
|---|---|
| (conflict_rate × retry_cost) < lock_management_overhead | Optimistic (OCC) |
| (conflict_rate × retry_cost) > lock_management_overhead | Pessimistic (locking) |
For parallel coding agents, each transaction = reading + editing a set of files; a conflict = two agents modify the same file. Conflict probability depends on codebase size, number of active agents, and task decomposition quality — well-decomposed non-overlapping tasks drive conflict rates toward zero.[5]
Three variables determine the break-even point:[10]
| Conflict Rate | Retry Cost | Recommended Strategy |
|---|---|---|
| Low | Any | Optimistic[5][10] |
| Moderate | Low | Optimistic[5][10] |
| Moderate | High | Pessimistic[5][10] |
| High | Any | Pessimistic[5][10] |
OCC / speculative approaches are strictly better than locking when:[1][6]
Conversely, when conflict probability is high, "the cost of repeatedly restarting transactions hurts performance significantly."[1]
From TLS literature, speculative parallelism achieves speedup when:[2][7]
See Section 8 for the direct mapping to parallel AI agent workflows.
See also: Concurrency Control Theory (theoretical OCC foundations); Deadlock, Starvation, and Fairness (costs of pessimistic strategies under contention)The four speculative/optimistic approaches covered in this pillar map to distinct agent coordination strategies with different tradeoffs. Choosing among them requires understanding actual conflict rates, merge semantic requirements, and whether coordination happens in memory or on disk.
The long-running conversation OCC pattern maps directly to parallel AI agent workflows:[5]
| HTTP Long-Running Conversation | Parallel AI Agent Equivalent |
|---|---|
| User reads data (first HTTP request) | Agent reads assigned files |
| User edits data over multiple minutes | Agent performs edits over extended period |
| User submits changes (final HTTP request) | Agent commits to shared repository |
| Version check at submit catches concurrent modifications | Version/hash check at commit detects if another agent modified same files |
OCC is the natural fit when agents work on different parts of the codebase most of the time (low conflict probability) and commits are infrequent relative to total work time.[5]
CRDTs represent an alternative to both pessimistic locking and OCC for code coordination:[3]
| Dimension | OCC (Git model) | CRDT |
|---|---|---|
| Locks required | No | No[3] |
| Validation/abort cycle | Yes, on conflict | No[3] |
| Merge always possible | No (manual resolution needed) | Yes, deterministic[3] |
| Semantic correctness of merges | Requires human review | Not guaranteed for complex code changes[3] |
| Central server required | Optional | No[3] |
CRDTs are strictly better than OCC when merge semantics can be defined mathematically and automatic merge results are always acceptable. For source code, this condition rarely holds for complex changes but may hold for fine-grained character/line level edits.[3]
For agent systems coordinating through shared state (not files), STM provides composable atomic operations, no deadlock risk, and automatic retry.[4] The buffered I/O pattern (queue irreversible operations, execute after successful commit) maps well to agent actions like file writes or external API calls that cannot themselves be made transactional.[4]
TLS principles applied to agent scheduling: when dependencies between agent tasks are unknown or hard to prove absent, speculative execution — running likely-independent agents in parallel with rollback on detected conflict — can outperform conservative sequential scheduling when actual conflicts are rare at runtime and rollback cost is bounded.[2] Rollback freedom (Princeton PLDI 2010) represents the ideal: statically proving that no conflicts can occur, eliminating rollback overhead entirely.[2]
| Strategy | Best For | Requires | Worst Case |
|---|---|---|---|
| OCC (Git model) | Low-conflict file editing; long-running agent sessions | Version tracking; atomic validate+commit[5][6] | Wasted work on high-conflict workloads |
| CRDT | Fine-grained character/line edits; offline-first; decentralized | Mathematically defined merge semantics[3] | Semantically invalid merges for complex code changes |
| STM | In-memory shared state coordination; composable agent actions | Idempotent operations; buffered I/O pattern[4] | 2× overhead; I/O cannot be inside transactions |
| TLS / Speculative | Task scheduling where dependencies are unknown; rare conflicts | Rollback mechanism; available parallel resources[2][7] | High misspeculation rates eliminate all speedup gains |
| Pessimistic locking | High-conflict hot spots; strict consistency; critical sections | Lock management; deadlock prevention[10] | Deadlock; blocked concurrency even when no conflicts occur |
Key finding: No single approach dominates across all agent coordination scenarios. The optimal system mirrors what production databases do: MVCC (optimistic) as the default for general work, with selective pessimistic locks applied only to known hot-spot resources — and, where code semantics allow, CRDT-style merging at fine granularity to eliminate conflict detection overhead entirely.[5][10][3]See also: Post-Merge Agent Recovery; Concurrency Control Theory; Deadlock, Starvation, and Fairness