Home

Optimistic & Speculative Approaches

Pillar: optimistic-speculative-execution | Date: May 2026
Scope: Alternatives to pessimistic locking for parallel agent coordination: optimistic concurrency control applied to code edits (read freely, validate at commit, rollback on conflict), speculative parallelism from compiler and hardware literature, conflict probability modeling to choose OCC vs pessimistic locking, CRDT-style mergeable data structures for code, software transactional memory (STM) analogs, and when empirical conflict rates make speculative approaches strictly better.
Sources: 10 gathered, consolidated, synthesized.

Executive Summary

Critical finding: The choice between optimistic and pessimistic concurrency is not philosophical — it reduces to a single break-even formula: when (conflict_rate × retry_cost) < lock_management_overhead, optimistic concurrency control wins unconditionally. For parallel coding agents working on well-decomposed non-overlapping tasks, conflict rates approach zero, making OCC strictly superior to locking in nearly all realistic configurations.[10][5]

Optimistic Concurrency Control was formalized in 1979 by H.T. Kung and John T. Robinson as a four-phase protocol — begin, modify, validate, commit/rollback — that acquires no locks during execution and defers all conflict detection to commit time.[1][6] Its central bet is that most transactions complete without interference; the lock-acquisition overhead paid on every pessimistic transaction is simply wasted cost when conflicts are rare. This bet is validated by ubiquitous adoption across Git, Redis, CouchDB, DynamoDB, Kubernetes, HTTP ETags, Oracle, PostgreSQL, and MySQL InnoDB — each implementing OCC or its MVCC generalization as the default read/write strategy. PostgreSQL's MVCC guarantee — "readers don't block writers and writers don't block readers" — directly enables the high-concurrency workloads that pessimistic locking cannot achieve without sacrificing throughput.[5]

The decision factor matrix makes the OCC vs. pessimistic boundary concrete across 7 independent variables: conflict frequency, workload type (read-heavy vs. write-heavy), retry cost, transaction duration, consistency requirements, cross-request transaction needs, and deadlock risk.[10] One factor is structurally decisive for multi-agent systems: pessimistic locks cannot be held across HTTP requests or long-duration sessions because they are connection-level mechanisms. This makes OCC the only viable approach for parallel AI agent workflows where each agent session may span minutes to hours of elapsed time.[5] The conflict rate decision matrix consolidates to four cases: low conflict rate → optimistic regardless of retry cost; high conflict rate → pessimistic regardless of retry cost; moderate conflict rate → split on retry cost magnitude.[5][10]

Thread Level Speculation (TLS) — also called Speculative Multithreading — extends OCC principles from database transactions to program execution itself: sequential code sections are parallelized speculatively at runtime, with dependency violations triggering squash-and-rerun.[2][7] The theoretical ceiling is rollback freedom, a safety condition defined at Princeton (PLDI 2010) where static analysis proves that any speculative execution is equivalent to a non-speculative one — eliminating logging, conflict detection, and rollback overhead entirely.[2] However, the practical ceiling is lower: prior TLS compilers and architectures have been "proven highly profitable only in limited cases and achieved little speedup on many real-world applications."[2][7] High misspeculation rates eliminate all gains; each squash discards completed work. MIT's T4 system (ISCA 2020) addressed this by dividing entire programs into fine-grained tasks and eliminating the global synchronization barriers that caused earlier systems to stall.[2]

Software Transactional Memory (STM) applies OCC semantics to in-memory data structures rather than disk-based files or databases — a thread executes reads and writes freely while maintaining a transaction log, then validates atomically at commit or rolls back and retries.[4][9] STM's critical differentiator from plain OCC is composability: a 2005 milestone by Harris, Marlow, Peyton Jones, and Herlihy demonstrated that arbitrary atomic operations can be composed into larger atomic operations — a property impossible with lock-based programming, where "correct fragments may fail when combined."[9] The performance cost is measured precisely: STM typically performs no worse than twice as slow as lock-based code on 1–4 processors, with O(n) worst-case space and time overhead for n concurrent transactions.[4][9] STM also carries a critical operational constraint: irreversible operations (I/O, network calls, file writes) cannot be performed inside transactions and must be buffered for post-commit execution.[4]

Conflict-Free Replicated Data Types (CRDTs), formally defined in 2011 by Shapiro, Preguiça, Baquero, and Zawirski, represent the most aggressive speculative stance: they mathematically guarantee conflict-free merging by requiring merge functions to satisfy commutativity, associativity, and idempotency — making concurrent updates automatically composable without coordination, validation, or rollback.[8][3] Production deployments validate the approach at scale: League of Legends used CRDT-based Riak to handle 7.5 million concurrent users at 11,000 messages/second; Apple Notes uses CRDTs for offline device synchronization; Redis Enterprise deploys CRDTs for multi-master geographic replication.[8] The Zed code editor (from creators of Atom and Tree-sitter) implements CRDT-based multiplayer editing natively in Rust. In sequence CRDT benchmarks, Yjs is measurably faster than Automerge for text operations due to its plainlist representation versus Automerge's tree structure.[8]

CRDTs carry a hard limit that constrains their use for agent-based code editing: automatic merges at the character or line level can produce programs that compile but are semantically broken.[3] This is not a pathological edge case — it is the default outcome when two agents make logically dependent changes to different files that a CRDT merges independently. CRDT merge rules are hardcoded by algebraic structure, not by program semantics. A CRDT counter cannot model a bank balance because the merge could permit negative values; similarly, a CRDT text merge cannot detect that agent A renamed a function while agent B added a caller using the old name.[3] CRDTs are strictly better than OCC only when automatic merge results are always acceptable — a condition that holds for fine-grained character-level collaborative editing (Zed, Yjs integrations with CodeMirror and Monaco) but rarely holds for multi-file semantic changes.[3]

One implementation hazard cuts across all four optimistic approaches: the validation and commit phases in OCC and STM must be performed as a single atomic operation to prevent a time-of-check to time-of-use (TOCTOU) race condition that can silently corrupt results even in correctly structured implementations.[6] In STM, an analogous hazard exists for inconsistent reads: an optimistically reading transaction may observe mixed old and new values from a concurrent partial write, potentially causing segmentation faults or infinite loops before the abort is detected — a correctness issue even though the transaction will ultimately roll back.[4] Both hazards require explicit architectural attention rather than falling out of the optimistic model automatically.

Practitioners building parallel agent coordination should apply the same hybrid strategy that production databases converge on: MVCC (optimistic) as the default for general agent work across disjoint file regions, with selective pessimistic locks applied only to known hot-spot resources like shared configuration files or build manifests, and CRDT-style merging at fine granularity where code semantics permit automatic resolution.[5][10][3] Task decomposition quality is the primary lever: well-decomposed non-overlapping agent tasks drive conflict rates toward zero, making the OCC bet structurally correct without requiring measurement. Where agent tasks do overlap on shared infrastructure, apply pessimistic locking selectively to those specific resources rather than imposing global sequential ordering. For in-memory agent state coordination (shared queues, status registries), STM provides deadlock-free composability at a 2× overhead cost that is acceptable when I/O can be buffered post-commit. Rollback-free task decomposition — statically provable non-overlap — remains the theoretical ideal, eliminating conflict detection overhead entirely at the cost of stricter upfront planning.



Table of Contents

  1. Optimistic Concurrency Control: Core Mechanics
  2. OCC Applied to Source Code and Version Control
  3. OCC vs. Pessimistic Locking: Decision Framework
  4. Speculative Multithreading / Thread Level Speculation
  5. Software Transactional Memory
  6. CRDTs — Conflict-Free Replicated Data Types
  7. Conflict Probability Modeling: When Speculative Wins
  8. Application to Multi-Agent Code Coordination

Section 1: Optimistic Concurrency Control — Core Mechanics

Optimistic Concurrency Control (OCC), also known as optimistic locking, is a non-locking concurrency control method for transactional systems first proposed in 1979 by H.T. Kung and John T. Robinson.[1][6] Its central assumption is that multiple transactions can frequently complete without interfering with each other — so they proceed without acquiring locks, deferring conflict detection until commit time. Transactions use data resources without acquiring locks; before committing, each verifies no other transaction has modified data it read; if conflicting modifications are found, the transaction rolls back and may be restarted.[6]

Four Operational Phases

PhaseActionNotes
1. Begin Record a timestamp marking transaction start Establishes the read baseline[1][6]
2. Modify Read data values and tentatively write changes without locking No blocking of other transactions[1]
3. Validate Check whether other transactions modified data this transaction used Covers transactions completed after start time; optionally covers still-active transactions[1][6]
4. Commit/Rollback No conflict → commit; conflict → abort and optionally retry Validation + commit must be atomic (TOCTOU hazard)[6]
Key finding: The validation and commit phases must be performed as a single atomic operation to avoid a time-of-check to time-of-use (TOCTOU) race condition — a subtle implementation hazard that can silently corrupt results even in systems that correctly implement the four-phase lifecycle.[6]

Two Validation Strategies

StrategyDirectionWhat It Checks
Backward validation Looks backward in time Conflicts with already-committed transactions[1]
Forward validation Looks forward Conflicts with currently active (uncommitted) transactions[1]

Performance Characteristics

Contention LevelOCC BehaviorSource
Low Completes without lock management overhead; superior throughput [1][6]
High "The cost of repeatedly restarting transactions hurts performance significantly" [1]

Lock-based approaches also carry their own failure mode: "locking can drastically limit effective concurrency even when deadlocks are avoided."[1][6] OCC trades retry cost (paid only when conflicts occur) against locking cost (paid on every transaction regardless) — optimal choice depends entirely on empirical conflict probability.[1]

Real-World OCC Implementations

SystemOCC MechanismDomain
GitEdit locally, detect conflicts at merge timeVersion control[1][6]
MediaWiki (Wikipedia)Page edit OCCWiki software[1]
BugzillaOCC; conflicts called "mid-air collisions"Bug tracking[1]
Redis, CouchDBOCC modelDatabases[1]
ElasticsearchSequence numbers for document versionsSearch[1]
DynamoDBConditional writes/versioning ("Conditional Updating")Cloud NoSQL[1]
KubernetesOCC on resource objectsOrchestration[1]
HTTP ETagsConditional GET/PUT with If-Match; mismatch → 412 Precondition FailedHTTP protocol[6]
Apache Solr_version_ fieldSearch[1]
Microsoft Entity FrameworkBuilt-in OCC via binary timestamp valueORM[6]
Google App EngineOCC modelCloud platform[1]
Ruby on Rails / GrailsBuilt-in OCC supportWeb frameworks[1]
Oracle, PostgreSQL, MySQL InnoDBMVCC (readers don't block writers; writers don't block readers)RDBMS[5]

Section 2: OCC Applied to Source Code and Version Control

Git is the canonical OCC system for source code: users edit their local copy freely, with conflicts detected only at merge time.[1][6] Because Git is distributed, no single user could hold a file lock during editing — concurrent edits are unavoidable by design. When conflicts occur (lines changed by two people simultaneously), the submitter of the most recent update resolves the conflict.[1] Most revision control systems — SVN, Git, Mercurial — use this "merge model," which is structurally identical to OCC: read freely, write locally, detect conflicts only at merge/commit time.[1]

HTTP ETags as OCC for Code Management

ETags apply OCC principles at the HTTP layer for code asset management:[6]

StepActionOCC Phase Equivalent
1Client stores ETag header from GET responseBegin (record timestamp/version)
2Client modifies resource locallyModify
3Client sends conditional PUT with If-Match: <etag>Validate
4aTag matches → server accepts writeCommit
4bTag mismatch → server returns 412 Precondition FailedRollback

Critical implementation constraint: tag validation and data write must be performed as a single atomic operation to avoid race conditions.[6]

Collaborative Code Editing via OCC

In collaborative document/code editing applications where multiple users edit simultaneously, the application tracks document versions; users make edits independently; the system detects conflicts by comparing versions; on conflict, a resolution mechanism is triggered.[6]

Key finding: OCC is a good fit for code coordination when low data contention exists — when it is unlikely that multiple agents compete for the same resource simultaneously — and when conflicts are rare enough that the cost of repeating failed requests is acceptable.[6]
See also: Post-Merge Agent Recovery (post-conflict resolution mechanics)

Section 3: OCC vs. Pessimistic Locking — Decision Framework

Core Definitions

ApproachMechanismConflict Handling
Pessimistic Locking Acquires shared lock on read; blocks other transactions from modifying record until released Prevents conflicts preemptively[5][10]
Optimistic Locking (OCC) No lock on read; detects conflict at write time via version column in WHERE clause; zero rows updated → OptimisticLockException Detects and retries[5][10]

"Since logical clocks are superior to physical clocks when implementing a concurrency control mechanism, a version column is used to capture the read-time row snapshot information. The version column is incremented every time an UPDATE or DELETE statement is executed."[10]

The Networking Analogy

Locking StrategyNetworking EquivalentProtocolBehavior
Pessimistic Wi-Fi CSMA/CA — Collision Avoidance Blocks preemptively before transmitting[5]
Optimistic Ethernet CSMA/CD — Collision Detection Transmits optimistically; detects and retries on collision[5]

Full Decision Factor Matrix

FactorChoose PessimisticChoose Optimistic
Conflict frequencyHighLow/Rare[10]
Workload typeWrite-heavyRead-heavy[10]
Retry costVery highLow[10]
Transaction durationShort (locks released quickly)Long/multi-step[10]
Consistency requirementsStrict (financial, compliance)Moderate (web apps)[10]
Cross-request transactionsNoYes (pessimistic locks can't span HTTP requests)[5]
Deadlock riskPresentNone[5]

MVCC as Industry Standard OCC

Database SystemMVCC ImplementationKey Property
OracleMVCC based on OCCReaders don't block writers[5]
PostgreSQLMVCC"Readers don't block writers and writers don't block readers"[5]
MySQL InnoDBMVCCVersioned row access[5]
CockroachDB, TiDB, YugabyteDBMVCC-basedDistributed OCC[5]

Long-Running Conversation Pattern (Optimistic Offline Locking)

OCC is uniquely suited for long-running conversations — a pattern directly analogous to parallel AI agent workflows:[5]

This pattern maps directly to parallel AI agent workflows — see Section 8 for the full mapping.

Pessimistic locking cannot be held across HTTP requests (connection-level locks), making OCC the only viable approach for multi-step, long-duration workflows.[5]

Hybrid Approaches in Production

Many production systems blend both strategies:[10]

Key finding: The fundamental question is how likely conflicts are — the decision factor matrix above captures the key variables across conflict frequency, workload type, retry cost, and consistency requirements. Quantitative break-even analysis is in Section 7.[5][10]
See also: Concurrency Control Theory (OCC theoretical foundations); Deadlock, Starvation, and Fairness (lock-related failure modes)

Section 4: Speculative Multithreading / Thread Level Speculation

As processor clock-rate gains stalled in the early 2000s, the computer industry shifted to multi-threaded and multi-core architectures.[7] TLS emerged as a way to exploit these parallel cores even for sequential programs without obvious parallelism. Thread Level Speculation (TLS) — also called Speculative Multithreading or Speculative Parallelization — is a dynamic (runtime) parallelization technique that speculatively executes code sections anticipated for later execution in parallel with normal execution on separate threads.[2][7] TLS uncovers parallelism that static (compile-time) techniques fail to exploit because at compile time thread independence cannot be guaranteed — but is frequently true at runtime.[2] It provides a path to parallelize sequential programs without complex data dependence analysis or explicit synchronization.

How TLS Works

  1. Code (typically loops or sequential sections) is divided into chunks
  2. Chunks are executed in parallel by different threads speculatively
  3. A monitoring mechanism (hardware or software) detects dependency violations
  4. When violations are detected, squashing and re-execution occurs[2][7]

Speculation Strategies

StrategyDescriptionPracticality
Eager execution Computes all paths simultaneously Usually impractical — too many paths[7]
Predictive execution (TLS) Runs most likely path in parallel; rolls back on conflict Most common pattern; production-viable[7]

Three Rollback Response Strategies

StrategyScope of RollbackConservatism
Full restart Stop entire parallel execution; restart completely Most conservative[2][7]
Partial restart Stop and restart offending threads and their successors Moderate[2][7]
Targeted rollback Stop only the offending thread and downstream consumers Most fine-grained[2][7]

Hardware vs. Software Approaches

ApproachWho Decides ParallelismMechanismCost
Hardware (dynamic) Processor at runtime Cache coherence; version management in caches; detects store/load conflicts across threads[7] Fast but expensive; requires processor/memory modifications
Software (static) Compiler at compile time Plans parallel instructions ahead of time Fails on complex control flow, pointer-heavy code[7]
Hybrid (TLS) Compiler + hardware runtime Compiler parallelizes loops ignoring ambiguous dependences; hardware detects/enforces at runtime[7] Best general-purpose approach

Compiler Limitations Addressed by TLS

Current parallelizing compilers can only tackle applications with regular access patterns on arrays or affine indices, and data dependencies expressible in linear form.[2] TLS overcomes this for:

Notable Research Implementations

SystemOriginKey Innovation
MiniTLS First software TLS system Eager memory data management (in-place updates); novel compact version management for parallelized rollback; recovers faster than existing software eager TLS systems[2]
Lector (Lazy inspECTOR) Academic research Lightweight inspector thread runs alongside each speculative thread; proactively verifies data dependencies before misspeculation occurs; avoids reactive squashing[2]
Rollback-free (Princeton PLDI 2010) Princeton University Defines safety condition (rollback freedom): any speculative execution is equivalent to non-speculative execution without logging, conflict detection, or rollback — statically proven via static analysis[2]
T4 (MIT ISCA 2020) MIT Divides entire programs into tiny tasks, exposing speculative parallelism from first instruction; addresses prior work's "unselective aborts that wastefully discard work" and global synchronization that stalls all later tasks[2]

Performance Characteristics

Key finding: Rollback freedom (Princeton PLDI 2010) represents the theoretical ideal of speculative execution: a static analysis safety condition where any speculative execution is equivalent to a non-speculative one, eliminating all runtime logging, conflict detection, and rollback overhead entirely.[2]
See also: Deadlock, Starvation, and Fairness (lock-specific failure modes in non-speculative approaches)

Section 5: Software Transactional Memory

Software Transactional Memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing — an alternative to lock-based synchronization implemented entirely in software. STM is "a technique for simplifying concurrent programming by allowing multiple state-changing operations to be grouped together and performed as a single atomic operation."[4][9]

Optimistic Core Mechanism

STM is typically very optimistic: a thread completes modifications to shared memory without regard for what other threads might be doing, recording every read and write in a log. The onus is placed on the reader (not the writer) to verify consistency after completing a transaction.[4]

PhaseAction
Read PhaseThread executes reads and writes while maintaining a transaction log of all memory accesses[4][9]
Commit PhaseValidates whether other threads modified memory accessed during this transaction; success → permanent commit; conflict → abort with full rollback[4][9]
Abort/RetryConflicting transaction rolls back and re-executes from beginning until successful[4][9]

Conflict Detection Mechanisms

MechanismWhen Locks AcquiredKey PropertyTradeoff
Encounter-time locking (eager) During write operations Values logged in undo log Earlier conflict detection but may block more[4]
Commit-time locking with version clock (TL2) Commit phase only "Invisible reads" — reads acquire no locks; global version clock; location version > read-version → immediate abort; avoids quadratic overhead of incremental validation Lower overhead but conflicts detected later[4]

Composability — Key Differentiator from Locks

"Lock-based programs do not compose: correct fragments may fail when combined." STM enables atomic composition of operations through transaction wrapping.[4]

2005 milestone: Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy described an STM system built on Concurrent Haskell that enables arbitrary atomic operations to be composed into larger atomic operations — a useful concept impossible with lock-based programming.[9]

Core Benefits vs. Locks

BenefitDescription
Composability and ModularityWrite concurrent abstractions easily composed with any other STM abstraction without exposing synchronization details[9]
Freedom from deadlockNo lock ordering concerns — code contains no locks[9]
No lost wakeupsNo condition variables means no missed notifications[9]
Increased concurrencyDifferent threads simultaneously modify disjoint parts of a data structure protected by the same lock under pessimistic approaches[9]

Performance Characteristics

DimensionSTM Behavior
1–4 processorsTypically performs "no worse than twice as slow" vs. lock-based — due to logging and commit overhead[4][9]
Theoretical complexityO(n) worst-case space and time for n concurrent transactions[4]
Starvation riskLengthy transactions continuously competing with multiple short transactions can starve[4][9]
Design goal"STM aims to provide the ease of coarse-grained locking with the efficiency of fine-grained locking"[4][9]

Critical Implementation Hazard: Inconsistent Reads

Optimistic reading creates a subtle hazard: incomplete transactions may read inconsistent state — mixing old and new values from partial writes by other threads — potentially causing segmentation faults or infinite loops before abort detection occurs. This is a correctness issue even though the transaction will ultimately abort.[4]

Buffered I/O Pattern for Irreversible Operations

Any operation in a transaction must be idempotent since transactions may retry. Irreversible operations (I/O, network calls) cannot be performed inside transactions. Solution: create buffers that queue up irreversible operations and execute them after the transaction successfully commits.[4]

STM Design Patterns

PatternDescription
IO After a TransactionRun IO immediately after atomically completes[9]
Retry-based blockingUse retry to wait for a condition without busy-waiting[9]
Alternative actions with orElseTry one transaction, fall back to another on retry[9]
Composable atomic blocksCombine multiple STM operations into larger atomic operations[9]

Language Implementations

LanguageImplementationNotes
Haskell (GHC) Built-in STM monad; first implemented in GHC 6.4 Type system enforces transactional safety; primitives: atomically, TVar, retry, orElse, throwSTM, catchSTM[9]
Clojure Built-in, first-class Refs Validators (per-ref constraint functions), Watchers (change notifications), eClojure extension adds after-commit/on-abort/on-commit + Haskell-inspired retry/orElse[9]
Scala ZIO STM Functional effect system integration[4][9]
Kotlin arrow-fx-stm Arrow functional library STM[4][9]
Rust async-stm Async-aware STM[4][9]
OCaml Kcas [4][9]
Go Kashmir [4][9]
C, C++, C#, Java, Python Various libraries Without language-level mutability constraints, developer must self-enforce transactional discipline[9]

STM vs. OCC vs. Pessimistic Locking

ApproachConflict DetectionOverhead (no conflict)Overhead (conflict)Composable
Pessimistic lockingNever (blocked preemptively)Lock acquire/releaseThread waitingNo[4]
OCCAt commitMinimalAbort + retryNo[4]
STMAt commitLog maintenanceAbort + retryYes[4]
Key finding: STM is essentially OCC applied to in-memory data structures, with the addition of composability and automatic retry semantics — but practical adoption has been limited by performance overhead (up to 2× slower on 1–4 processors), inability to perform I/O inside transactions, and difficulty integrating with legacy code.[4]

Section 6: CRDTs — Conflict-Free Replicated Data Types

A Conflict-free Replicated Data Type (CRDT) is a data structure replicated across multiple computers enabling independent, concurrent updates without coordination. The system automatically resolves inconsistencies, guaranteeing eventual convergence among replicas despite temporary divergence. Formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski — initially motivated by collaborative text editing and mobile computing.[8] CRDTs "do not assume the use of a single server," enabling true peer-to-peer networks — contrasting with Google Docs, Trello, and Figma, which require all communication to route through a central server.[3]

Mathematical Basis for Conflict-Free Merging

State-based CRDTs require merge functions satisfying three algebraic properties:[3][8]

PropertyFormulaEffect
Commutativitya ⊔ b = b ⊔ aOrder of receipt doesn't matter
Associativity(a ⊔ b) ⊔ c = a ⊔ (b ⊔ c)Grouping of merges doesn't matter
Idempotencya ⊔ a = aDuplicate messages are harmless; applying the same update twice yields the same result

These properties ensure CRDTs remain "invariant under package re-ordering and duplication." The merge function computes the join for any pair of replica states, forming a semilattice where merge = least upper bound. Update functions must be monotone regarding the partial order.[3][8]

Two Types of CRDTs

TypeWhat Is TransmittedAdvantageDisadvantage
State-Based (CvRDT) Full local state to all replicas on every update Simple; compatible with gossip protocols "The entire state of every CRDT must be transmitted eventually to every other replica, which may be costly" — high bandwidth[3][8]
Operation-Based (CmRDT) Update operations (deltas) directly to replicas Lower bandwidth; smaller payloads Requires middleware guarantees: operations not dropped/duplicated, delivered in causal order[3][8]

Specific CRDT Algorithms

CategoryAlgorithmSemantics
CountersG-CounterGrow-only; each node maintains its own slot; merge uses maximum values[8]
PN-CounterCombines two G-Counters (positive + negative) for increment/decrement[8]
SetsG-SetAdditions only; merge is union[8]
2P-SetAdds tombstone set for removals; "remove-wins" semantics[8]
LWW-Element-SetTimestamps; allows reinsertion after removal via timestamp comparison[8]
OR-SetUnique tags instead of timestamps; removal requires matching all add-tags[8]
Sequences (Text/Code)WOOTWithout Operational Transformation[8]
Treedoc, RGAReplicated Growable Array[8]
Logoot, LSEQ, LogootSplitPosition-indexed sequences[8]
YjsPlainlist instead of tree — industrial optimization; notably faster than Automerge for text operations[8]
AutomergeJSON-like CRDT; Rust core with WASM bindings[3][8]
LoroRust + JS; based on Replayable Event Graph; supports rich text, list, map, movable tree[3]

Applications to Source Code and Collaborative Code Editing

ToolCRDT TechnologyEditor Integration
ZedCRDT-based multiplayer; open source; written in Rust; from creators of Atom and Tree-sitterNative code editor[3][8]
YjsHigh-performance CRDT framework; modularCodeMirror, Monaco, Quill, ProseMirror[3][8]
AutomergeJSON-like CRDT; Rust + WebAssemblyGeneral collaborative apps[3][8]
Teletype (Atom)CRDT-based real-time collaborationAtom editor[8]
ConclaveLSEQ CRDTPeer-to-peer research editor[8]
crdt.elCRDTEmacs[8]
LoroReplayable Event Graph; rich text, list, map, movable treeGeneral collaborative apps (Rust+JS)[3]

Industry Deployments at Scale

OrganizationCRDT ImplementationScale / Use Case
League of LegendsRiak CRDT7.5 million concurrent users, 11,000 messages/second[8]
SoundCloudLWW-element-set CRDT (Roshi)Event streaming[8]
Apple NotesCRDTsSyncing offline edits between multiple devices[8]
Microsoft Fluid FrameworkServer reference implementations + client SDKsCollaborative applications platform[8]
FigmaServer-authoritative per-property last-writer-winsCollaborative design[8]
Redis EnterpriseCRDTsMulti-master replication across geographic datacenters[3][8]
Microsoft Azure CosmosDBCRDTs or custom merge proceduresGlobal multi-region writes[3][8]
Riak (2013)Native CRDT supportOne of first databases with first-class CRDT support[3][8]
AntidoteDBCRDTs + highly available transactionsGeo-replicated database[3][8]

Performance Considerations

Industrial sequence CRDTs, including open-source ones, are known to out-perform academic implementations due to optimizations and a more realistic testing methodology.[8] In benchmarks, Yjs is notably faster than Automerge for text operations, reflecting the advantages of its plainlist representation over tree-based approaches.[8]

CRDTs vs. OCC vs. Pessimistic Locking

ApproachConflict HandlingCoordination RequiredRollbackCentral Server
Pessimistic LockingPrevent preemptivelyUpfront lock acquisitionNoneYes[3]
OCC (Git model)Detect + manual resolveValidate at commitYes, on conflictOptional[1]
CRDTMathematically impossibleNoneNoneNo[3]

CRDTs are "a bit like version control with Git, but without having to resolve merge conflicts manually" — the CRDT provides built-in conflict resolution.[8] Unlike Git's OCC model, CRDTs mathematically guarantee automatic conflict resolution at all times. Unlike operational transformation, CRDTs support decentralized peer-to-peer operation without a central server.[8]

Limitations of CRDTs for Source Code

  1. Hardcoded merge rules may not match application semantics — e.g., CRDT counters cannot model bank balances (merges could allow negative balances)[3]
  2. Designing correct merge functions is challenging for complex data types[3]
  3. Not for inherently transactional problems — CRDTs are eventual consistency, not serializable consistency[3]
  4. Semantic invalidity: for source code, automatic merges at character/line level may produce syntactically valid but semantically incorrect programs[3]
Key finding: CRDTs are strictly better than OCC when merge semantics can be defined mathematically and automatic merge results are always acceptable — but for source code coordination, this condition rarely holds for complex multi-file changes, where automatic character-level merges can produce programs that compile but are semantically broken.[3]
See also: Post-Merge Agent Recovery (handling semantically incorrect merges)

Section 7: Conflict Probability Modeling — When Speculative Approaches Win

The optimal choice between optimistic (speculative) and pessimistic (locking) approaches depends entirely on empirical conflict probability in the actual workload.[1][5][10] This is not a theoretical question — it requires measuring actual conflict rates in production or simulation.

Break-Even Formula

The decision boundary:[10]

ConditionWinner
(conflict_rate × retry_cost) < lock_management_overheadOptimistic (OCC)
(conflict_rate × retry_cost) > lock_management_overheadPessimistic (locking)

For parallel coding agents, each transaction = reading + editing a set of files; a conflict = two agents modify the same file. Conflict probability depends on codebase size, number of active agents, and task decomposition quality — well-decomposed non-overlapping tasks drive conflict rates toward zero.[5]

Three variables determine the break-even point:[10]

Conflict Rate Decision Matrix

Conflict RateRetry CostRecommended Strategy
LowAnyOptimistic[5][10]
ModerateLowOptimistic[5][10]
ModerateHighPessimistic[5][10]
HighAnyPessimistic[5][10]

Strict Win Conditions for Speculative Approaches

OCC / speculative approaches are strictly better than locking when:[1][6]

Conversely, when conflict probability is high, "the cost of repeatedly restarting transactions hurts performance significantly."[1]

Speculative Parallelism Win Condition (Hardware Context)

From TLS literature, speculative parallelism achieves speedup when:[2][7]

See Section 8 for the direct mapping to parallel AI agent workflows.

See also: Concurrency Control Theory (theoretical OCC foundations); Deadlock, Starvation, and Fairness (costs of pessimistic strategies under contention)

Section 8: Application to Multi-Agent Code Coordination

The four speculative/optimistic approaches covered in this pillar map to distinct agent coordination strategies with different tradeoffs. Choosing among them requires understanding actual conflict rates, merge semantic requirements, and whether coordination happens in memory or on disk.

OCC as the Natural Model for Agent Coordination

The long-running conversation OCC pattern maps directly to parallel AI agent workflows:[5]

HTTP Long-Running ConversationParallel AI Agent Equivalent
User reads data (first HTTP request)Agent reads assigned files
User edits data over multiple minutesAgent performs edits over extended period
User submits changes (final HTTP request)Agent commits to shared repository
Version check at submit catches concurrent modificationsVersion/hash check at commit detects if another agent modified same files

OCC is the natural fit when agents work on different parts of the codebase most of the time (low conflict probability) and commits are infrequent relative to total work time.[5]

CRDT as Alternative with Different Tradeoffs

CRDTs represent an alternative to both pessimistic locking and OCC for code coordination:[3]

DimensionOCC (Git model)CRDT
Locks requiredNoNo[3]
Validation/abort cycleYes, on conflictNo[3]
Merge always possibleNo (manual resolution needed)Yes, deterministic[3]
Semantic correctness of mergesRequires human reviewNot guaranteed for complex code changes[3]
Central server requiredOptionalNo[3]

CRDTs are strictly better than OCC when merge semantics can be defined mathematically and automatic merge results are always acceptable. For source code, this condition rarely holds for complex changes but may hold for fine-grained character/line level edits.[3]

STM as In-Memory Coordination Analog

For agent systems coordinating through shared state (not files), STM provides composable atomic operations, no deadlock risk, and automatic retry.[4] The buffered I/O pattern (queue irreversible operations, execute after successful commit) maps well to agent actions like file writes or external API calls that cannot themselves be made transactional.[4]

Speculative Parallelism as Agent Scheduling Philosophy

TLS principles applied to agent scheduling: when dependencies between agent tasks are unknown or hard to prove absent, speculative execution — running likely-independent agents in parallel with rollback on detected conflict — can outperform conservative sequential scheduling when actual conflicts are rare at runtime and rollback cost is bounded.[2] Rollback freedom (Princeton PLDI 2010) represents the ideal: statically proving that no conflicts can occur, eliminating rollback overhead entirely.[2]

Strategy Selection Summary

StrategyBest ForRequiresWorst Case
OCC (Git model) Low-conflict file editing; long-running agent sessions Version tracking; atomic validate+commit[5][6] Wasted work on high-conflict workloads
CRDT Fine-grained character/line edits; offline-first; decentralized Mathematically defined merge semantics[3] Semantically invalid merges for complex code changes
STM In-memory shared state coordination; composable agent actions Idempotent operations; buffered I/O pattern[4] 2× overhead; I/O cannot be inside transactions
TLS / Speculative Task scheduling where dependencies are unknown; rare conflicts Rollback mechanism; available parallel resources[2][7] High misspeculation rates eliminate all speedup gains
Pessimistic locking High-conflict hot spots; strict consistency; critical sections Lock management; deadlock prevention[10] Deadlock; blocked concurrency even when no conflicts occur
Key finding: No single approach dominates across all agent coordination scenarios. The optimal system mirrors what production databases do: MVCC (optimistic) as the default for general work, with selective pessimistic locks applied only to known hot-spot resources — and, where code semantics allow, CRDT-style merging at fine granularity to eliminate conflict detection overhead entirely.[5][10][3]
See also: Post-Merge Agent Recovery; Concurrency Control Theory; Deadlock, Starvation, and Fairness

Sources

  1. Optimistic concurrency control - Wikipedia (retrieved 2026-05-03)
  2. Speculative multithreading - Wikipedia (retrieved 2026-05-03)
  3. About CRDTs — Conflict-free Replicated Data Types (retrieved 2026-05-03)
  4. Software transactional memory - Wikipedia (retrieved 2026-05-03)
  5. Optimistic vs. Pessimistic Locking — Vlad Mihalcea (retrieved 2026-05-03)
  6. Optimistic concurrency control - Wikipedia (retrieved 2026-05-03)
  7. Speculative multithreading - Wikipedia (retrieved 2026-05-03)
  8. Conflict-free replicated data type - Wikipedia (retrieved 2026-05-03)
  9. Software Transactional Memory - Wikipedia (retrieved 2026-05-03)
  10. Optimistic vs. Pessimistic Locking - Vlad Mihalcea (retrieved 2026-05-03)

Home