Pillar: concurrency-control-theory | Date: May 2026
Scope: Foundational CS theory applied to concurrent code modification: two-phase locking (2PL), multi-version concurrency control (MVCC), optimistic concurrency control (OCC) theoretical foundations, serializability, conflict vs data isolation, vector clocks and happened-before ordering, intent locks, lock compatibility matrices, and transaction isolation levels from database systems.
Sources: 31 gathered, consolidated, synthesized.
Critical gap: A 2013 survey of 18 major SQL and "NewSQL" databases found that only 3 of 18 provide serializability by default, and 8 of 18 — including Oracle and SAP — do not offer full serializability at all. Oracle's "SERIALIZABLE" isolation mode is actually Snapshot Isolation. This is the deployment reality for the 50-year body of theory built to guarantee serializability.[4][14][23]
Concurrency control theory partitions into 3 protocol families: pessimistic (2PL and variants), optimistic (OCC, timestamp ordering), and multi-version (MVCC). The correctness target for all three is serializability — defined in 1976 as a schedule equivalent to some serial execution. Two-Phase Locking, introduced by Eswaran, Gray, Lorie, and Traiger in that same year, dominated database systems for approximately 30 years as the primary serializability mechanism.[1][11] MVCC reached its first commercial implementation in 1984 (VAX Rdb/ELN, Jim Starkey at DEC), and OCC's foundational paper by Kung and Robinson (1981) has accumulated over 1,556 citations — among the most influential papers in database systems theory.[2][13]
Serializability theory distinguishes two correctness notions that are not equivalent. Conflict-serializability — the practical target — requires that a schedule's precedence graph be acyclic; this can be checked in polynomial time. View-serializability is weaker: it admits schedules with "blind writes" that no conflict-serializable execution produces, and determining view-serializability is NP-complete.[4][14] This NP-completeness is the direct reason every practical database system targets conflict-serializability exclusively. The 3 conflict types — Read-Write (non-repeatable read), Write-Read (dirty read), Write-Write (lost update) — map one-to-one onto the S/X lock compatibility matrix, making the matrix a complete mechanical enforcement of conflict-serializable schedules.[28]
The critical distinction between Snapshot Isolation and serializability is the write skew anomaly (A5B). Under SI, two concurrent transactions read overlapping data and write to disjoint parts; neither detects a write-write conflict; both commit; the combined result violates a constraint neither write alone would violate. The canonical example: two bank accounts V1=V2=$100 with constraint V1+V2≥0. T1 withdraws $200 from V1, T2 withdraws $200 from V2; both see the other's initial snapshot and both commit, leaving V1=V2=−$100.[30] SI prevents the 3 ANSI anomalies (dirty reads, fuzzy reads, phantoms) but remains vulnerable to write skew — a gap the 1995 Berenson et al. ACM SIGMOD critique proved is not captured by any ANSI isolation level. Snapshot Isolation is strictly stronger than READ COMMITTED and incomparable to REPEATABLE READ.[8][26]
Serializable Snapshot Isolation (SSI), proposed by Cahill, Röhm, and Fekete at ACM SIGMOD 2008 and shipped in PostgreSQL 9.1 in 2011, closes the SI-to-serializability gap without introducing any reader-writer blocking. The detection mechanism reduces full cycle detection to identifying 2-arc "dangerous structures" in the MVCC serialization graph — a necessary condition for non-serializable cycles. This is categorically superior to SS2PL: under SSI, neither readers block writers nor writers block readers, deadlocks are impossible, and the false-abort rate remains low (some conservative false positives remain). SSI is the theoretically optimal isolation mechanism for mixed read-write workloads as of 2011.[30]
MVCC's core property — readers never block writers; writers never block readers — comes at a storage cost and a garbage collection burden. Every write creates a new version in the version chain; obsolete versions accumulate until no active transaction can see them. PostgreSQL's VACUUM FREEZE uses a stop-the-world traversal and additionally faces the XID wraparound problem as 32-bit transaction IDs overflow. MySQL InnoDB risks undo log exhaustion under update-intensive workloads.[12] Many production systems mitigate this by combining MVCC for reads with locking for write-write conflicts — capturing non-blocking read properties while retaining write ordering guarantees from 2PL.[21] Google Spanner extends this further: it uses Multiversion Timestamp Ordering (MVTO) with TrueTime for external certification, achieving strict serializability across globally distributed nodes.[12]
Multiple Granularity Locking (MGL), introduced by Gray, Lorie, and Putzolu at VLDB 1975, resolves the tension between fine-grained concurrency and coarse-grained conflict detection by adding 4 intent modes (IS, IX, SIX, NL) to the basic S/X pair. The key property: placing an IS or IX lock on a parent node reduces coarse-grained conflict detection from O(n) leaf scans to O(1) parent-node checks.[6][16] The full 6-mode compatibility matrix (NL, IS, IX, S, SIX, X) is implemented identically in IBM DB2, Oracle (TM/TX locks), SQL Server (row/page/extent/table/database), MySQL InnoDB, and PostgreSQL — making it the single most universally deployed concurrency primitive in production database systems.[24]
Vector clocks provide the minimum logical data structure for complete bidirectional causality detection in distributed systems. Lamport's 1978 scalar clocks establish one-directional implication only: a→b implies C(a)<C(b), but C(a)<C(b) does not imply a→b — Lamport clocks cannot distinguish causal precedence from concurrence. Vector clocks, independently developed by Colin Fidge and Friedemann Mattern both in 1988 (and by at least 6 groups total), provide the bidirectional property: VC(a)<VC(b) if and only if a happened-before b.[5][15] The cost is O(n) space per process, where n is the number of processes. Amazon Dynamo, Riak, and CouchDB use version vectors for distributed conflict detection; Git's DAG structure is equivalent to vector clocks for file history and merge conflict identification.[25]
Deadlock theory adds a practical constraint on all locking-based protocols. The 4 Coffman conditions must hold simultaneously for a deadlock to occur; breaking any one prevents it. Conservative 2PL (C2PL) prevents deadlocks by eliminating hold-and-wait — transactions must declare all read-sets and write-sets before execution begins — but this advance-knowledge requirement makes it rarely implemented in practice.[1][29] The two timestamp-based prevention protocols, Wait-Die (non-preemptive) and Wound-Wait (preemptive), both prevent deadlock and starvation by always aborting the younger transaction and restarting it with its original timestamp — ensuring the restarted transaction eventually becomes the oldest active transaction and achieves highest priority. OCC eliminates deadlocks entirely since no locks are held during execution; MVCC reduces but does not eliminate them since write-write conflicts still require locking in most implementations.[29]
The theoretical foundations of concurrency control are domain-independent. The S/X lock compatibility matrix, two-phase locking discipline, and readers-writers priority policies (first introduced by Courtois, Heymans, and Parnas in 1971) appear identically in OS file locking (IBM pioneered file locking in 1963 for mainframe OS/360), database systems (1976), and distributed key-value stores (2007+). Unix/Linux defaults to advisory locking — non-cooperating processes are not blocked; Windows uses mandatory semantics, with OS enforcement regardless of cooperation. NFS has historically poor lock support across network boundaries, forcing distributed applications to implement application-level coordination.[19][27]
Practitioner implications: For parallel code modification pipelines specifically, the theory resolves to 4 concrete choices. First, if agents operate on independent files with rare conflicts, OCC (optimistic with validation at commit) delivers maximum throughput with zero deadlock risk — the 1981 Kung-Robinson model maps cleanly onto file-level workspace isolation. Second, if agents share files or overlapping regions, MVCC semantics with SSI detection provide full serializability without blocking — the 2008 Cahill et al. algorithm is the right model, implemented cheaply via version counters per file. Third, for hierarchical project structures (repo → directory → file → region), MGL's intent-lock discipline enables O(1) conflict detection at directory level without exhaustive file enumeration. Fourth, in distributed agent pipelines where no shared clock exists, vector clocks are the minimum data structure needed to distinguish concurrent modifications (potential conflicts) from causally ordered ones (safe to chain) — Lamport scalar clocks are insufficient for this purpose. The industry survey finding — only 3 of 18 databases default to serializability — is a warning: systems that silently provide Snapshot Isolation instead will permit write-skew anomalies that feel correct but violate constraints, exactly the failure mode most damaging in multi-agent code modification where constraint invariants span multiple files.
Classical concurrency control theory partitions protocols into three families based on how they handle the fundamental tension between correctness and throughput. Pessimistic protocols (2PL and its variants) acquire locks before accessing data, paying upfront blocking costs to prevent conflicts. Optimistic protocols (OCC, timestamp ordering) operate freely on private copies or under timestamp discipline, paying a validation or abort cost at commit time. Multi-version protocols (MVCC) retain prior data versions so reads never block writes, at a storage overhead cost.[1][3][2]
The correctness target is serializability — the highest isolation level, defined as a schedule equivalent to some serial execution. For approximately 30 years after its 1976 introduction, Two-Phase Locking was the dominant algorithm for achieving serializability in database systems.[1][11]
Key finding: Timestamp Ordering (T/O) provides the theoretical building blocks for both OCC and MVCC. OCC = T/O with all validation deferred to commit time; MVCC = T/O with multiple versions retained to allow reads to always succeed. Together with 2PL, these three form the complete foundation of classical concurrency control theory.[31]
| Protocol | Year Introduced | Mechanism | Deadlock Risk | Reader/Writer Blocking | Serializability Guarantee |
|---|---|---|---|---|---|
| Two-Phase Locking (2PL) | 1976 (Eswaran, Gray et al.) | Pessimistic locking | Yes | Readers block writers, writers block readers | Conflict-serializable[1] |
| Timestamp Ordering (T/O) | ~1976–1979 | Timestamp-based ordering | No | Aborts on violations | Conflict-serializable (basic), view-serializable (Thomas rule)[9] |
| Optimistic CC (OCC) | 1979/1981 (Kung & Robinson) | Private workspace + validation | No | Never blocks | Conflict-serializable[3][22] |
| MVCC | 1978 (Reed), 1981 (Bernstein & Goodman); 1984 first commercial (VAX Rdb/ELN, Jim Starkey, DEC)[2][12][21] | Multiple data versions | Reduced | Readers never block writers | Snapshot Isolation by default; SSI for full serializability[2][12] |
| Serializable SI (SSI) | 2008 (Cahill et al.) | MVCC + anti-dependency cycle detection | No | No read-write blocking | Full serializability[30] |
Two-Phase Locking is a pessimistic concurrency control method that guarantees conflict-serializability by dividing transaction execution into a growing phase (locks acquired only) and a shrinking phase (locks released only). Introduced in 1976 by K.P. Eswaran, J. Gray, R.A. Lorie, and I. Traiger, it remained the dominant serializability algorithm for approximately 30 years.[1][11][20]
The protocol imposes one rule: "a transaction must never acquire a lock after it has released a lock."[1] The lock point marks the moment all needed locks are acquired; after this point the transaction can only release locks.[1][11] Two fundamental lock types govern access:
| Variant | Lock Release Policy | Conflict-Serializable | Deadlock-Free | Recoverable | Strict |
|---|---|---|---|---|---|
| Basic 2PL | Locks released during shrinking phase | Yes | No | No | No |
| Conservative 2PL (C2PL) | All locks acquired before execution begins | Yes | Yes | Uncertain | Partial |
| Strict 2PL (S2PL) | X locks held until commit/abort; S locks released during shrinking | Yes | No | Yes | Yes |
| Strong Strict 2PL (SS2PL) | All locks held until commit/abort — no shrinking phase in practice | Yes | No | Yes | Yes |
Conservative 2PL prevents deadlocks by eliminating the hold-and-wait condition — transactions must declare all read-sets and write-sets in advance. Rarely implemented in practice due to this advance-knowledge requirement.[1][11][29]
SS2PL (Rigorous 2PL) is the most widely used variant in commercial database management systems. It provides the strongest consistency guarantees while simplifying recovery — a transaction effectively only has a growing phase until completion.[1][11]
Key finding: Any schedule produced under the basic 2PL protocol is conflict serializable. The serialization order of transactions is determined by topological sort of the precedence graph of the schedule. This establishes 2PL as a sufficient condition for serializability — but not a necessary one: serializable schedules exist that no valid 2PL execution produces.[1][11][20]
MVCC is a non-locking concurrency control method that maintains multiple versions of data items simultaneously. Rather than overwriting data on write, MVCC creates a new version — readers access older committed versions while writers create newer ones. The fundamental property: readers never block writers; writers never block readers.[2][12][21]
| Year | Contributor / Event | Significance |
|---|---|---|
| 1978 | David P. Reed (dissertation) | First description of MVCC principles; claims original authorship[2][12][Reed 1978] |
| 1981 | Phil Bernstein & Nathan Goodman | "Concurrency Control in Distributed Database Systems" — detailed MVCC formalization[2][12] |
| 1984 | VAX Rdb/ELN (Jim Starkey, DEC) | First commercial MVCC implementation; Starkey later developed InterBase (now Firebird)[2][12][21] |
The DBMS maintains a version chain per logical tuple — a linked list of versions sorted by timestamp. Each version carries:[12][21]
A transaction always reads the latest version whose WTS ≤ TS(Tᵢ) — this is snapshot isolation.[2][12][21]
From Bernstein, Hadzilacos & Goodman (1987):[12]
| Method | Mechanism |
|---|---|
| Multiversion Timestamp Ordering (MVTO) | Timestamps assigned to transactions; reads/writes directed to appropriate versions based on timestamps |
| Multiversion Two-Phase Locking (MV2PL) | Combines version management with 2PL locking discipline |
| Multiversion Mixed Methods | Hybrids combining timestamp ordering and locking |
Most MVCC implementations provide Snapshot Isolation (SI) by default — not full serializability. The write skew anomaly exposes this gap: two concurrent transactions read overlapping data, each writes to a different part, and the combined result violates a constraint that neither write alone would violate. No write-write conflict is detected under SI; both transactions commit.[2][12] Serializable Snapshot Isolation (SSI) was developed in 2008 to close this gap.[30]
Obsolete versions accumulate and must be reclaimed. A version is reclaimable when no active transaction can see it, or when created by an aborted transaction.[2][12][21] Key GC strategies:
A side benefit: without GC, MVCC supports time-travel queries — querying the database as of a past snapshot.[2][12][21]
| Property | MVCC | 2PL |
|---|---|---|
| Reader-writer conflict | None — readers access old versions | S locks block X lock acquisition |
| Deadlock risk | Reduced (readers hold no locks) | Yes — requires detection/prevention |
| Storage overhead | High — multiple versions per object | Low — single version |
| Default serializability | Snapshot Isolation only | Full conflict-serializability (with SS2PL) |
| GC complexity | Significant operational concern | Not applicable |
| Read-heavy workloads | Excellent | Moderate — S locks allow parallel reads |
| System | MVCC Approach |
|---|---|
| PostgreSQL | Native MVCC; VACUUM for GC; SSI since v9.1[12] |
| MySQL InnoDB | MVCC via undo logs[12] |
| Oracle | Multi-version reads via undo segments; "SERIALIZABLE" mode is actually SI[12][26] |
| Google Spanner | MVTO + external certification via TrueTime for strict serializability[12] |
| Microsoft Hekaton, SAP HANA, MemSQL, NuoDB | In-memory MVCC implementations[12] |
Key finding: Many modern database systems combine MVCC for reads with locking for write-write conflicts — gaining the non-blocking read properties of MVCC while retaining the write-ordering guarantees of 2PL.[21]
OCC assumes transactions complete without conflict most of the time. Rather than acquiring locks, transactions operate freely on private data copies, then validate at commit time that no conflicts occurred. The approach is "optimistic" in that it relies primarily on transaction rollback as a control mechanism, "hoping" conflicts will not occur.[3][13][22]
| Phase | Action | Locking |
|---|---|---|
| 1. Begin | Record a timestamp marking transaction start | None |
| 2. Read/Modify | Read DB values; write to private workspace (local copies only — no global writes). On first write to an object, copy it; all subsequent writes target the copy. | None |
| 3. Validate | Check whether other transactions have modified data this transaction read or wrote. Transaction number (tn) is assigned at the end of the read phase, not the beginning. | Critical section only |
| 4. Commit / Rollback | If validation succeeds, publish changes. If conflict detected, abort and restart. | Brief write lock |
For transaction Tᵢ to precede Tⱼ in the equivalent serial order, at least one of the following must hold:[3]
| Method | Conditions Checked | Critical Section Scope | Best For |
|---|---|---|---|
| Backward Validation (serial) | Conditions 1 and 2 — looks back at already-committed transactions | Wraps entire validate + write sequence | Simpler implementation; less scalable for disk-based DBs[3][22] |
| Forward Validation (parallel) | All three conditions — looks ahead at currently active transactions | Wraps only the step to identify concurrent transactions | Higher concurrency environments; more scalable[3][22] |
Low-contention environments: OCC excels — transactions avoid lock management overhead and eliminate waiting periods, producing superior throughput compared to locking approaches.[3][13]
High-contention environments: Performance degrades severely. Per Kung & Robinson: "repeatedly restarting transactions hurts performance significantly." Thrashing occurs when the same transactions repeatedly conflict and abort. Starvation is possible — a transaction with a long read phase must check write sets of all transactions that committed while it was active, potentially requiring unbounded buffer space.[3][22]
Google App Engine Datastore, Apache Solr (via _version_ field), Elasticsearch (document updates), CouchDB (document revisions via _rev fields), MonetDB, Software Transactional Memory (STM) systems, DynamoDB and MongoDB (conditional updates), Mimer SQL, Rails and Grails web frameworks.[13][22]
In web applications, HTTP's stateless architecture makes traditional locking impractical. OCC is implemented via ETags in HTTP headers, hidden form fields with checksums/tokens, and version numbers in database records.[3]
Key finding: OCC eliminates deadlocks entirely (no locks held during execution) and avoids all blocking, but at the cost of potential transaction restarts. The 1981 Kung & Robinson paper spawned Snapshot Isolation, Serializable Snapshot Isolation (SSI), and Software Transactional Memory (STM) — the three most important descendants in modern concurrency control.[3][22]
Timestamp Ordering is a concurrency control protocol where the DBMS uses timestamps — rather than locks — to determine the serialization order of transactions. Each transaction Tᵢ receives a unique, monotonically increasing timestamp TS(Tᵢ). Classified as optimistic in that conflicts are assumed rare, but unlike OCC, conflicts trigger immediate abort rather than deferred validation.[9][31]
Each data item Q maintains two timestamps:[9][31]
| Operation | Condition | Action | Rationale |
|---|---|---|---|
| Read(Q) | TS(T) < WTS(Q) | ABORT T | T tries to read a value already overwritten by a later transaction (physically unrealizable) |
| Read(Q) | TS(T) ≥ WTS(Q) | ALLOW READ, set RTS(Q) = max(RTS(Q), TS(T)) | Read is temporally consistent |
| Write(Q) | TS(T) < RTS(Q) | ABORT T | T tries to write a value a later transaction already read |
| Write(Q) | TS(T) < WTS(Q) | ABORT T | T tries to write an outdated value — already overwritten |
| Write(Q) | Otherwise | ALLOW WRITE, set WTS(Q) = TS(T) | Write is temporally consistent |
Important limitation: Every read requires writing a new RTS(Q) value — every read becomes a write operation, creating significant overhead.[9][31]
Robert H. Thomas (1979) — "A majority consensus approach to concurrency control for multiple copy databases," ACM TODS 4(2): 180–209 — proposed ignoring outdated writes rather than aborting them:[9]
| Situation (Write Q) | Basic T/O | Thomas Write Rule |
|---|---|---|
| TS(T) < WTS(Q) | Abort T | Ignore the write (skip, continue) — the outdated write is irrelevant |
| TS(T) < RTS(Q) | Abort T | Abort T (unchanged) |
| Otherwise | Allow write | Allow write |
| Serializability type | Conflict-serializable | View-serializable only |
| Concurrency | Lower (unnecessary aborts) | Higher (fewer aborts) |
The Thomas Write Rule trades conflict-serializability for view-serializability, gaining concurrency at the cost of a harder correctness check — determining view-serializability is NP-complete.[9]
To avoid cascading rollbacks, the strict variant delays (rather than aborts) operations that would violate timestamp order, executing them once all conflicting operations of older transactions have committed or aborted.[31]
| Property | 2PL | T/O |
|---|---|---|
| Mechanism | Lock acquisition | Timestamp comparison |
| Deadlock | Possible — detection/prevention needed | Impossible — no locks held |
| Starvation | Less likely | Long-running transactions may starve (repeatedly aborted by newer ones) |
| Overhead | Lock table management | Timestamp tracking; every read writes a new RTS |
| Cascading rollbacks | In basic 2PL | Possible in basic T/O |
| Aborted transactions restart with | Same identity | New (higher) timestamp — the restarted transaction is now serialized after the transaction that caused the abort, so it can read the most recently committed version without a timestamp violation.[9] |
Key finding: T/O provides the theoretical bridge between locking-based and version-based concurrency. MVCC is a multiversion extension of T/O that eliminates most read violations by allowing reads to always find an appropriate older version — transforming aborts into version selects.[9][31]
Serializability is the fundamental correctness criterion for concurrent database transactions: a schedule is serializable if it is equivalent to some serial schedule — i.e., it produces the same result as if transactions executed one at a time. It is the highest level of isolation between transactions.[4][14][23] A schedule (or history) is the ordered list of operations from multiple transactions: reads (r₁[x]), writes (w₁[x]), commits (c₁), and aborts (a₁).[4]
Two operations are in conflict if and only if: (1) they belong to different transactions, (2) at least one is a write, and (3) they access the same object. Conflicting operations are non-commutative — swapping their order changes the outcome.[4][14]
| Conflict Type | Operations | Risk |
|---|---|---|
| Read-Write (RW) | r₁[x], w₂[x] | T2 overwrites value T1 read — non-repeatable read[4] |
| Write-Read (WR) | w₁[x], r₂[x] | T2 reads value written by T1 before T1 commits — dirty read[4] |
| Write-Write (WW) | w₁[x], w₂[x] | T2 overwrites T1's uncommitted write — lost update[14] |
Two schedules are conflict-equivalent if they maintain identical transaction actions and identical ordering of every conflicting operation pair. A schedule is conflict-serializable when its precedence graph is acyclic.[4][14][23]
Precedence graph construction: one node per transaction; directed edge Tᵢ → Tⱼ if Tᵢ has a conflicting operation that precedes one in Tⱼ. A topological sort on the acyclic graph yields the equivalent serial schedule. This is the foundational theorem backing both 2PL correctness proofs and SSI cycle detection.[4][14]
A weaker equivalence that requires: (1) identical initial value reads, (2) preserved read-from relationships, and (3) matching final write assignments. All conflict-serializable schedules are view-serializable; the converse is false — view-serializable schedules with blind writes exist that are not conflict-serializable. Determining view-serializability is NP-complete, which is why practical systems target conflict-serializability exclusively.[4][14][23]
| Dimension | Containment Order (subset to superset) |
|---|---|
| Serializability | Serial ⊂ Conflict-serializable ⊂ View-serializable ⊂ All schedules[4][14] |
| Recoverability | Serial ⊂ Strict ⊂ Cascadeless (ACA) ⊂ Recoverable ⊂ All schedules[4][14] |
| Class | Definition | Guarantees |
|---|---|---|
| Recoverable | Transactions commit only after all transactions whose changes they read have committed | No orphaned data dependencies[4] |
| Cascadeless (ACA) | Transactions never read data from uncommitted writes (no dirty reads) | Prevents cascading rollbacks[4][14] |
| Strict | A write operation's conflicting operation cannot proceed until the write's transaction commits/aborts | Efficient failure recovery; SS2PL produces strict schedules[4] |
Even combining standard serializability and linearizability can still result in unexpected anomalies such as causal reversal. A stale read — failing to see effects of a completed write — is common in distributed systems. Strict serializability adds real-time ordering requirements to ensure transactions respect wall-clock ordering, eliminating stale reads.[23]
Key finding: In a survey of 18 SQL and "NewSQL" databases, only 3 of 18 provided serializability by default, and 8 — including Oracle and SAP — did not offer full serializability at all. Oracle's "SERIALIZABLE" isolation mode actually provides Snapshot Isolation, not true serializability.[4][14][23]
Primary source: Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J.M., and Stoica, I. (2013). Highly Available Transactions: Virtues and Limitations. Proceedings of the VLDB Endowment 7(3):181–192.
Isolation is the "I" in ACID. The four ANSI/ISO SQL standard isolation levels trade concurrency for correctness: lower isolation yields higher concurrency and more anomalies; higher isolation yields fewer anomalies with greater resource consumption.[8][17]
| Anomaly | Formal Pattern | Description |
|---|---|---|
| A1 Dirty Read | w1[x] … r2[x] … (a1 and c2 in any order) |
T2 reads T1's uncommitted write; if T1 rolls back, T2 has read data that never existed[8] |
| A2 Fuzzy Read | r1[x] … w2[x] … c2 … r1[x] … c1 |
T1 reads x twice; T2 modifies and commits between the reads — T1 gets different values[8] |
| A3 Phantom Read | r1[P] … w2[y in P] … c2 … r1[P] … c1 |
T1 queries by predicate P; T2 inserts/deletes rows matching P and commits; T1's re-query returns different rows[8] |
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Write Skew |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Possible |
| READ COMMITTED | Prevented | Possible | Possible | Possible |
| CURSOR STABILITY Prevents P4 (lost update) for cursor operations[Berenson 1995] | Prevented | Possible | Possible | Possible |
| REPEATABLE READ | Prevented | Prevented | Possible | Possible |
| SNAPSHOT ISOLATION | Prevented | Prevented | Prevented | Possible |
| SERIALIZABLE | Prevented | Prevented | Prevented | Prevented |
| Level | Read Locks | Write Locks | Default In |
|---|---|---|---|
| READ UNCOMMITTED | None | Long (held until commit) | — |
| READ COMMITTED | Short (released after read) | Long | PostgreSQL, Oracle, SQL Server[8][17] |
| REPEATABLE READ | Long on accessed rows; no range/predicate locks | Long | MySQL InnoDB[8][17] |
| SERIALIZABLE | Long + predicate/range locks (prevents phantoms) | Long | — |
Berenson et al. 1995 (ACM SIGMOD) identified phenomena missing from the original ANSI specification:[26][8][17]
| Anomaly | Code | Description |
|---|---|---|
| Dirty Write | P0 | T overwrites a value written by another still-in-flight transaction. Can violate database consistency. Not in original ANSI standard.[26] |
| Lost Update | P4 | T1 reads x=10, T2 reads x=10, T1 writes x=20, T2 writes x=15 (not 25 as intended) — T1's update is lost.[8][17][26] |
| Read Skew | A5A | T1 reads x and y; T2 changes both between the two reads — T1 sees an inconsistent partial update.[17][26] |
| Write Skew | A5B | Two concurrent transactions read overlapping data, write to different parts; combined result violates a constraint neither write alone would violate. Classic example: two on-call doctors each conclude a colleague is on-call and go off-call simultaneously — violating the "at least one on-call" constraint.[8][17][26] |
Authors: Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil. Published ACM SIGMOD 1995. Key findings:[8][17][26][Berenson 1995]
The paper formalized Cursor Stability (between READ COMMITTED and REPEATABLE READ; prevents lost update for cursor operations) and Snapshot Isolation as explicit levels.[17][26]
Two distinct theoretical frameworks define what it means for a concurrency control protocol to be "correct," and they are not equivalent. Conflict-based isolation, exemplified by 2PL, prevents conflicting operations (read-write, write-read, write-write pairs on the same object from different transactions) from interleaving. A schedule is correct under this framework if and only if its precedence graph is acyclic — i.e., it is conflict-serializable.[4][14] Data-based (anomaly-based) isolation, the framework of the ANSI SQL standard and its critique by Berenson et al. 1995, defines correctness by the absence of specific observable anomalies: dirty reads (A1), fuzzy reads (A2), phantoms (A3), and the additional phenomena (P0, P4, A5A, A5B) not captured by the original ANSI definitions.[8][26]
The two frameworks are not equivalent. Conflict-serializability implies anomaly-freedom: any conflict-serializable schedule is free of all classical ANSI anomalies. But anomaly-freedom does not imply conflict-serializability. The canonical example is Snapshot Isolation: SI prevents dirty reads (A1), fuzzy reads (A2), and phantoms (A3) — satisfying the ANSI SERIALIZABLE anomaly criteria in their strict interpretation — yet SI is not conflict-serializable, because it permits write skew (A5B). Two concurrent transactions can each read overlapping data, write to disjoint parts, and both commit under SI without any write-write conflict being detected; the resulting schedule cannot be produced by any serial execution. This is precisely the gap that motivated Berenson et al.’s 1995 critique — the ANSI definitions, relying implicitly on locking semantics, fail to capture SI as a distinct isolation level — and the subsequent development of Serializable Snapshot Isolation (SSI) in 2008 to restore full conflict-serializability without sacrificing SI’s non-blocking read properties.[8][17][26][30]
Gray et al. 1976 introduced "degrees of consistency," the precursor to SQL isolation levels:[7][18][24]
| Degree | Informal Definition | Locking Protocol | SQL Equivalent |
|---|---|---|---|
| 0 | No dirty writes (may write uncommitted data of others) | Short-duration X locks only | Below READ UNCOMMITTED |
| 1 | No uncommitted writes visible to others | Long-duration X locks | READ UNCOMMITTED |
| 2 | No dirty reads (read committed) | Long X locks + short read locks | READ COMMITTED |
| 3 | Repeatable reads (full serializability) | Long X locks + long read locks | SERIALIZABLE |
Key finding: Snapshot Isolation is strictly stronger than READ COMMITTED and incomparable to REPEATABLE READ — it prevents read skew (which REPEATABLE READ at the row level does not), but permits write skew (which SERIALIZABLE prevents). This incomparability was the central motivation for Berenson et al.'s critique.[8][17][26]
Lock compatibility matrices formalize which lock types can be held simultaneously by different transactions. The matrix directly enforces conflict-serializability: incompatible lock requests block the requesting transaction, preventing out-of-order conflicting operations.[1][28]
| Requested \ Held | S (Shared) | X (Exclusive) |
|---|---|---|
| S (Shared) | Compatible ✓ | Incompatible ✗ |
| X (Exclusive) | Incompatible ✗ | Incompatible ✗ |
Lock acquisition rules (Berkeley CS186 formulation):[28]
If a requested lock is incompatible with a held lock, the requesting transaction waits until all incompatible locks are released.[28]
Upgrading: Conversion from S to X lock. The Lock Manager places upgrade requests at the front of the queue — ahead of new S lock requests — to reduce starvation risk for upgrading transactions.[28]
Downgrading: Conversion from X to S lock; allowed during the shrinking phase.[28]
The three conflict types map directly to incompatibilities in the matrix:[28]
For hierarchical resource locking, six lock modes are required:[6][16][28]
| Held \ Requested | NL | IS | IX | S | SIX | X |
|---|---|---|---|---|---|---|
| NL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| IS | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| IX | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| S | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| SIX | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| X | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
Key observations from Gray 1976:[7][18]
Some sources (raw_24) present a 5-mode matrix omitting NL, treating "no lock" as implicit rather than a named mode. Both representations are equivalent.[24]
Key finding: The lock compatibility matrix is the algorithmic heart of 2PL: the three conflict types (RW, WR, WW) that define conflict-serializability map one-to-one onto incompatibilities in the S/X matrix, making the matrix + 2PL protocol a complete mechanical enforcement of conflict-serializable schedules.[28]
Multiple Granularity Locking (MGL) allows transactions to lock data at different levels of a resource hierarchy — from fine-grained individual rows to coarse-grained entire tables. Without MGL, locking a table exclusively would require scanning all rows for existing locks: O(n) per check. With intent locks on the parent node, conflict detection becomes O(1).[6][16][24]
| Granularity | Concurrency | Locking Overhead | Use Case |
|---|---|---|---|
| Fine-grained (individual rows) | High | High | OLTP with many small transactions[6] |
| Coarse-grained (entire table) | Low | Low | Batch operations, DDL[6] |
| Mixed (MGL) | Adaptive | Adaptive | General-purpose DBMSes[16] |
Database resource hierarchy: Database → Files/Tables → Pages → Records/Rows → Fields[6][16]
| Mode | Name | Meaning |
|---|---|---|
| NL | Null Lock | No lock held; compatible with everything[6][16] |
| IS | Intention Shared | Intent to acquire S locks on descendants; allows others to read the node[6][7] |
| IX | Intention Exclusive | Intent to acquire X locks on descendants; blocks S locks on the whole subtree[7][16] |
| S | Shared | Read access to entire subtree; no exclusive locks can exist anywhere in subtree[7][18] |
| SIX | Shared + Intention Exclusive | Read entire subtree; intent to write specific lower nodes — "reads all, writes some"[6][7] |
| X | Exclusive | Exclusive write lock on node and entire subtree; incompatible with all other locks[7][18] |
Acquisition direction: Top-down (root to leaf)
Release direction: Bottom-up (leaf to root) — a node can only be unlocked after all its children are unlocked.[6][7][16][18][24]
| Lock Desired on Node | Required Parent Condition |
|---|---|
| S or IS | At least one parent holds IS or IX[6][7] |
| X, IX, or SIX | At least one parent holds IX or SIX[6][7] |
For DAGs (nodes with multiple parents): before requesting IX, SIX, or X, all immediate parent nodes must hold IX or greater — not just one.[24]
Gray et al. prove that if two lock graphs are compatible, the implicit locks on the leaves are also compatible — the intent-lock hierarchy is provably equivalent to naive leaf-level locking without the O(n) overhead.[24] If too many fine-grained locks accumulate, the DBMS can escalate to a coarser granularity (e.g., row locks → table lock) to reduce overhead.[24]
| System | Lock Granularity Implementation |
|---|---|
| IBM DB2 | Row-level, page-level, table-level with IS/IX/SIX[6][24] |
| Oracle | TX (transaction) and TM (table) locks with row-level granularity[6][24] |
| SQL Server | Row, page, extent, table, and database locks[6][24] |
| MySQL InnoDB | Row-level locks with table-level intent locks[6][24] |
| PostgreSQL | Intent locks for table/row-level locking[6][24] |
Key finding: The fundamental insight of Multiple Granularity Locking — "announce intentions at coarsest granularity, execute at finest" — reduces coarse-grained conflict detection from O(n) leaf scans to O(1) parent-node checks, making scalable fine-grained locking practical in production database systems.[6][16]See also: Monorepo Tooling (for practical file-level locking implementations)
In distributed systems with no global clock, ordering events across nodes requires logical time. The foundational insight from Lamport (1978): "It is not important that all processes agree on what the actual time is, but that they agree on the order in which events occur."[5][15][25]
Formalized by Leslie Lamport in "Time, Clocks, and the Ordering of Events in a Distributed System" (CACM 21(7), 1978). The happens-before relation → is defined by:[5][15][25]
a and b are events in the same process and a precedes b, then a → ba is the sending of a message and b is its receipt, then a → ba → b and b → c implies a → ca → b nor b → a, then a and b are concurrent (denoted a ↔ b or a ∥ b)A numerical counter per process; rules:[5][15][25]
Key property: If a → b then C(a) < C(b).
Critical limitation: The converse does NOT hold. If C(a) < C(b), we CANNOT conclude a → b. Lamport clocks cannot detect concurrency — only enforce known ordering.[5][15][25]
A vector clock is an array of N logical clocks — one per process in an N-process system. Process Pᵢ maintains vector VCᵢ where VCᵢ[i] counts Pᵢ's own events and VCᵢ[j] records what Pᵢ knows about Pⱼ's event count.[5][15][25]
Algorithm:
| Relationship | Condition | Interpretation |
|---|---|---|
| VC1 = VC2 | All elements equal | Same logical instant |
| VC1 < VC2 (VC1 happened-before VC2) | All elements of VC1 ≤ corresponding VC2 elements, at least one strictly less | VC1 causally precedes VC2 |
| VC1 > VC2 | Reverse of above | VC2 causally precedes VC1 |
| Concurrent (VC1 ∥ VC2) | Neither VC1 ≤ VC2 nor VC2 ≤ VC1 | No causal relationship — treat as conflict |
Example: [2,3,1] vs [3,0,0] — first slot of second is greater (3>2) but second slot of first is greater (3>0). Neither dominates → concurrent events, flag as potential conflict.[5][15]
The complete bidirectional property: VC(a) < VC(b) if and only if a happened-before b. This is what Lamport clocks cannot provide — vector clocks enable true conflict detection, not just ordering.[5][15]
| Feature | Lamport Clocks | Vector Clocks |
|---|---|---|
| Structure | Single integer counter | Array of N integers |
| Space complexity | O(1) per process | O(n) per process |
| Causal ordering | Partial (one-directional implication) | Full (bidirectional iff) |
| Concurrency detection | No | Yes |
| Use cases | Total ordering, log sequencing, leader election | Conflict detection, distributed MVCC, CRDTs |
Happened-before provides the formal foundation for detecting write-write conflicts in distributed systems. The partial order defined by happened-before maps directly to the serializability graph concept. In MVCC distributed databases, each write carries its vector; when replicas exchange versions, if neither vector dominates the other, the writes were concurrent — the store keeps both as siblings for application-level merging.[25]
Vector clock size grows linearly with the number of processes — a real overhead in large, dynamic clusters.[5][15] Variants addressing this:
| Variant | Approach |
|---|---|
| Dotted Version Vectors | More space-efficient; address anomalies in standard vector clocks[25] |
| Version Vectors | Simplified form for key-value stores[15] |
| Hybrid Logical Clocks (HLC) | Combine physical and logical time[25] |
| Matrix Clocks | Track knowledge of others' knowledge[15] |
| Plausible Clocks | Reduce storage overhead with approximations[15] |
| Interval Tree Clocks | Support dynamic process creation/deletion[25] |
| Bloom Clocks | Probabilistic compact representation[15] |
Key finding: Vector clocks are the minimum logical data structure that enables complete bidirectional causality detection in distributed systems. Lamport clocks can establish a total order but cannot distinguish causal precedence from concurrence — the distinction that matters for conflict detection in parallel code modification, distributed databases, and CRDT-based collaborative editing.[5][15]
A deadlock occurs when two or more transactions each hold resources the other needs, creating a standstill where none can proceed. Classic example: Transaction A holds Resource 1 and waits for Resource 2; Transaction B holds Resource 2 and waits for Resource 1.[10][29]
A deadlock can only occur when all four conditions hold simultaneously — breaking any single condition prevents deadlock:[10][29]
The wait-for graph is a directed graph where nodes are active transactions and an edge Tᵢ → Tⱼ means Tᵢ is waiting for a lock held by Tⱼ. A cycle in the graph indicates a deadlock.[10][29]
Example cycle: T1 waits for T3's lock on X; T3 waits for T2's lock on Y; T2 waits for T1's lock on Z → deadlock.[10]
Limitation: Suitable only for systems where transactions are lightweight with few resource instances. In distributed systems, deadlocks span multiple nodes and require a global wait-for graph with a central coordinator — or distributed detection algorithms.[10]
Victim selection criteria: minimum work done, fewest resources held, fewest remaining operations, lowest rollback cost.[29] Recovery options:
Both Wait-Die and Wound-Wait assign priorities based on transaction timestamps — older transaction = lower timestamp = higher priority. In both protocols, the younger transaction is always aborted.[10][29]
| Feature | Wait-Die (Non-Preemptive) | Wound-Wait (Preemptive) |
|---|---|---|
| Type | Non-preemptive | Preemptive |
| Older requests Younger's resource | Older waits | Younger is aborted ("wounded"); Older proceeds |
| Younger requests Older's resource | Younger dies (aborts) | Younger waits |
| Who is aborted? | Always younger (the requestor) | Always younger (the holder) |
| Holds resource? | Older never preempted from a resource | Older preempts resources from younger |
Starvation prevention: Rolled-back transactions restart with the same timestamp — not a new one. This means they eventually become the oldest active transaction and receive highest priority, ensuring progress.[29]
No-Wait Scheme: If a transaction cannot immediately acquire all needed locks, it aborts and retries — no waiting. Eliminates deadlock by eliminating the hold-and-wait condition entirely.[10]
Conservative 2PL prevents deadlocks by requiring all locks to be acquired before execution begins. If a transaction cannot acquire all needed locks, it waits without holding any — eliminating hold-and-wait. Requires advance declaration of all data access patterns, making it impractical for most database workloads.[10][29]
Deadlock avoidance tests each proposed lock acquisition before granting it. If granting the lock could lead to a future deadlock state, the request is denied — the transaction must wait or abort without holding any conflicting resources.[10] This strategy requires advance knowledge of each transaction’s future resource needs, making it impractical for most database workloads. In practice, detection is generally preferred: allow deadlocks to occur, then detect and resolve them reactively via the wait-for graph.
| Method | Approach | Trade-off |
|---|---|---|
| Edge-chasing algorithms | Each node propagates probe messages when detecting potential deadlock | Distributed, no single point of failure[10] |
| Chandy-Misra-Haas | Classic distributed deadlock detection algorithm | Well-proven; message overhead[10] |
| Timeout-based | Assume long wait = deadlock; abort | Simple; false positives under slow transactions[10] |
| Global wait-for graph | Central coordinator collects local WFGs, checks for global cycles | Complete detection; single point of failure[10] |
| Condition | Definition | Progress |
|---|---|---|
| Deadlock | Processes blocked, waiting for each other's resources | None — permanent standstill[10] |
| Livelock | Processes keep executing but make no progress (e.g., continuously back off and retry simultaneously) | None — OCC thrashing is a common example[10][29] |
| Starvation | A process is repeatedly denied resource access due to priority/scheduling | Indefinitely delayed[10][29] |
Key finding: Both Wait-Die and Wound-Wait prevent both deadlock and starvation — the same timestamp reuse on restart ensures eventual progress. The key difference is whether the older or younger transaction plays the active role: in Wait-Die, old waits and young dies; in Wound-Wait, old wounds (preempts) young, and young waits.[10][29]See also: Deadlock Starvation Fairness (for deadlock algorithms in practice, including distributed deadlock detection implementations)
Serializable Snapshot Isolation closes the gap between Snapshot Isolation and full serializability without introducing reader-writer blocking. It was proposed in the landmark paper: Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. "Serializable isolation for snapshot databases." SIGMOD '08, pp. 729–738.[30][Cahill 2008]
Until Cahill et al. 2008, the only ways to prevent write skew under Snapshot Isolation were: (1) modify applications with artificial locking, (2) introduce fake update conflicts, or (3) carefully analyze all transaction pairs — all requiring deep application-level knowledge and fragile maintenance.[30]
Classic Write Skew Example (Cahill et al. 2008): Two bank balances V1 and V2 with constraint V1 + V2 ≥ 0. Both start at $100:[30]
Why 2PL prevents this: Under 2PL, T1 holds S locks on V1 and V2; T2's attempt to acquire X lock on V2 is blocked by T1's S lock — transactions must serialize. Under SI without SSI, both transactions read from independent snapshots (no locks held), both proceed, both commit.[30]
SSI detects "dangerous structures" in the MVCC serialization graph:[30]
The problem is reduced from detecting entire cycles (expensive global graph traversal) to detecting specific two-arc structures (cheap per-transaction tracking). SSI is an optimistic technique: transactions continue in hopes everything will be fine, aborting only when a dangerous structure is detected.[30]
| Property | 2PL (SS2PL) | SSI |
|---|---|---|
| Reader blocking writers | Yes (S lock conflicts with X lock request) | No |
| Writer blocking readers | Yes (X lock conflicts with S lock request) | No |
| Deadlock possible | Yes | No |
| False abort rate | Zero — aborts only for actual conflicts | Low — some false positives (conservative detection) |
| Age | 1976 (Eswaran et al.) | 2008 (Cahill et al.); PostgreSQL 9.1 in 2011 |
| Concurrency under no conflict | Full (no unnecessary blocking) | Full (no blocking at all) |
| Concurrency under conflict | Blocks until conflict resolves | Aborts one transaction in dangerous structure |
(Sources:[30])
Based on Dan Ports (VLDB 2012) extending Cahill et al., PostgreSQL's implementation added:[30]
Snapshot Isolation was first formally defined in Berenson et al. 1995. SSI builds directly on this foundation — taking SI's favorable read-write non-blocking properties and MVCC storage model, then adding the minimum detection machinery required for full serializability guarantees.[30]
Key finding: SSI is the first practical algorithm to achieve full serializability without introducing any read-write lock conflicts. It is categorically superior to 2PL on read-write contention (neither direction blocks) while being essentially equivalent in the absence of conflicts — making it the theoretically optimal isolation mechanism for mixed workloads as of 2011.[30]
The theoretical foundations of database concurrency control are mirrored at the operating system level, where the same fundamental correctness problems appear in file access coordination.[19][27]
Formalized by Courtois, Heymans, and Parnas in 1971, "Concurrent Control with 'Readers' and 'Writers'" (CACM). The constraint: multiple readers may access a shared resource simultaneously (reads are non-destructive), but no thread may access while a writer is active.[19]
This maps directly to the S/X lock distinction: Readers ↔ Shared (S) locks; Writers ↔ Exclusive (X) locks.[19]
| Policy | Problem Name | Behavior | Starvation Risk |
|---|---|---|---|
| Read-Preferring | First Readers-Writers Problem | New readers always join while a reader is active | Writers — may wait indefinitely under high read load[19] |
| Write-Preferring | Second Readers-Writers Problem | New readers blocked if a writer is waiting | Readers — under heavy write load; generally preferred when data freshness is critical[19] |
| Fair | Third Readers-Writers Problem | No thread allowed to starve; bounded wait time for all | None — more complex to implement[19] |
Read-Copy-Update (RCU): Wait-free for readers. Writers copy data, modify the copy, atomically swap the pointer. Readers see either old or new version but never partial state. Used extensively in the Linux kernel.[19][27]
Seqlock (Linux kernel): Optimized for few writers. Readers note a sequence counter before and after reading; if the counter changed, a write occurred and the read is retried (OCC-style validation). Writers increment counter: odd = writing in progress, even = complete.[19]
IBM pioneered file locking in 1963 for mainframe OS/360 ("exclusive control"). The concept became universal across all major operating systems.[27]
| Primitive | Platform | Semantics | Granularity |
|---|---|---|---|
flock() | Unix/Linux | Advisory; entire files | File[27] |
fcntl() | Unix/Linux | Advisory; byte-range locking | Byte range[27] |
lockf() | POSIX standard | Advisory | Byte range[27] |
LockFile / LockFileEx | Windows | Mandatory; OS enforces automatically | Byte range[27] |
| Spinlocks | All platforms | Busy-wait; fast in low-contention | In-memory[27] |
| Semaphores | All platforms | Integer-regulated access count via wait()/signal() atomics | Logical resource[27] |
| Mutexes | All platforms | Binary lock; single-threaded write access | Object/region[27] |
Advisory vs. Mandatory locking: Unix/Linux defaults to advisory locking — processes voluntarily acquire and check locks; non-cooperating processes are not blocked. Windows uses mandatory semantics — the OS enforces locking automatically, blocking conflicting accesses regardless of cooperation.[27]
A "lock" in file systems may be associated with one or more data blocks, files, directories, volumes, or disk drives. Most OSes support record locking: individual records within a file may be locked independently, enabling higher concurrency for update-intensive file applications.[27]
2PL is applied at the OS level: the scheduler acquires all necessary locks in the growing phase and releases them in the shrinking phase — checking each operation for conflicts with existing locks before granting or delaying.[27]
File system locking implements the same fundamental concepts as database concurrency control:[27]
NFS portability: NFS (Network File System) has historically poor lock support across network boundaries — a significant operational difference from local file system locking that forces distributed applications to implement application-level coordination.[27]
Key finding: The theoretical foundations of concurrency control are domain-independent: the S/X lock compatibility matrix, two-phase locking discipline, and readers-writers priority policies appear identically in OS file locking (1963), database systems (1976), and distributed key-value stores (2007+). Only the granularity, advisory/mandatory semantics, and network-transparency requirements differ across domains.[19][27]See also: Monorepo Tooling (for file-level locking in build systems and parallel code modification), Optimistic Speculative Execution (for practical recommendations on OCC vs. locking in code modification pipelines)
The following 31 source documents are the consolidated research corpus for this pillar. Inline citations [1]–[31] link to these entries.