Home

Classical Concurrency Control Theory

Pillar: concurrency-control-theory | Date: May 2026
Scope: Foundational CS theory applied to concurrent code modification: two-phase locking (2PL), multi-version concurrency control (MVCC), optimistic concurrency control (OCC) theoretical foundations, serializability, conflict vs data isolation, vector clocks and happened-before ordering, intent locks, lock compatibility matrices, and transaction isolation levels from database systems.
Sources: 31 gathered, consolidated, synthesized.

Executive Summary

Critical gap: A 2013 survey of 18 major SQL and "NewSQL" databases found that only 3 of 18 provide serializability by default, and 8 of 18 — including Oracle and SAP — do not offer full serializability at all. Oracle's "SERIALIZABLE" isolation mode is actually Snapshot Isolation. This is the deployment reality for the 50-year body of theory built to guarantee serializability.[4][14][23]

Concurrency control theory partitions into 3 protocol families: pessimistic (2PL and variants), optimistic (OCC, timestamp ordering), and multi-version (MVCC). The correctness target for all three is serializability — defined in 1976 as a schedule equivalent to some serial execution. Two-Phase Locking, introduced by Eswaran, Gray, Lorie, and Traiger in that same year, dominated database systems for approximately 30 years as the primary serializability mechanism.[1][11] MVCC reached its first commercial implementation in 1984 (VAX Rdb/ELN, Jim Starkey at DEC), and OCC's foundational paper by Kung and Robinson (1981) has accumulated over 1,556 citations — among the most influential papers in database systems theory.[2][13]

Serializability theory distinguishes two correctness notions that are not equivalent. Conflict-serializability — the practical target — requires that a schedule's precedence graph be acyclic; this can be checked in polynomial time. View-serializability is weaker: it admits schedules with "blind writes" that no conflict-serializable execution produces, and determining view-serializability is NP-complete.[4][14] This NP-completeness is the direct reason every practical database system targets conflict-serializability exclusively. The 3 conflict types — Read-Write (non-repeatable read), Write-Read (dirty read), Write-Write (lost update) — map one-to-one onto the S/X lock compatibility matrix, making the matrix a complete mechanical enforcement of conflict-serializable schedules.[28]

The critical distinction between Snapshot Isolation and serializability is the write skew anomaly (A5B). Under SI, two concurrent transactions read overlapping data and write to disjoint parts; neither detects a write-write conflict; both commit; the combined result violates a constraint neither write alone would violate. The canonical example: two bank accounts V1=V2=$100 with constraint V1+V2≥0. T1 withdraws $200 from V1, T2 withdraws $200 from V2; both see the other's initial snapshot and both commit, leaving V1=V2=−$100.[30] SI prevents the 3 ANSI anomalies (dirty reads, fuzzy reads, phantoms) but remains vulnerable to write skew — a gap the 1995 Berenson et al. ACM SIGMOD critique proved is not captured by any ANSI isolation level. Snapshot Isolation is strictly stronger than READ COMMITTED and incomparable to REPEATABLE READ.[8][26]

Serializable Snapshot Isolation (SSI), proposed by Cahill, Röhm, and Fekete at ACM SIGMOD 2008 and shipped in PostgreSQL 9.1 in 2011, closes the SI-to-serializability gap without introducing any reader-writer blocking. The detection mechanism reduces full cycle detection to identifying 2-arc "dangerous structures" in the MVCC serialization graph — a necessary condition for non-serializable cycles. This is categorically superior to SS2PL: under SSI, neither readers block writers nor writers block readers, deadlocks are impossible, and the false-abort rate remains low (some conservative false positives remain). SSI is the theoretically optimal isolation mechanism for mixed read-write workloads as of 2011.[30]

MVCC's core property — readers never block writers; writers never block readers — comes at a storage cost and a garbage collection burden. Every write creates a new version in the version chain; obsolete versions accumulate until no active transaction can see them. PostgreSQL's VACUUM FREEZE uses a stop-the-world traversal and additionally faces the XID wraparound problem as 32-bit transaction IDs overflow. MySQL InnoDB risks undo log exhaustion under update-intensive workloads.[12] Many production systems mitigate this by combining MVCC for reads with locking for write-write conflicts — capturing non-blocking read properties while retaining write ordering guarantees from 2PL.[21] Google Spanner extends this further: it uses Multiversion Timestamp Ordering (MVTO) with TrueTime for external certification, achieving strict serializability across globally distributed nodes.[12]

Multiple Granularity Locking (MGL), introduced by Gray, Lorie, and Putzolu at VLDB 1975, resolves the tension between fine-grained concurrency and coarse-grained conflict detection by adding 4 intent modes (IS, IX, SIX, NL) to the basic S/X pair. The key property: placing an IS or IX lock on a parent node reduces coarse-grained conflict detection from O(n) leaf scans to O(1) parent-node checks.[6][16] The full 6-mode compatibility matrix (NL, IS, IX, S, SIX, X) is implemented identically in IBM DB2, Oracle (TM/TX locks), SQL Server (row/page/extent/table/database), MySQL InnoDB, and PostgreSQL — making it the single most universally deployed concurrency primitive in production database systems.[24]

Vector clocks provide the minimum logical data structure for complete bidirectional causality detection in distributed systems. Lamport's 1978 scalar clocks establish one-directional implication only: a→b implies C(a)<C(b), but C(a)<C(b) does not imply a→b — Lamport clocks cannot distinguish causal precedence from concurrence. Vector clocks, independently developed by Colin Fidge and Friedemann Mattern both in 1988 (and by at least 6 groups total), provide the bidirectional property: VC(a)<VC(b) if and only if a happened-before b.[5][15] The cost is O(n) space per process, where n is the number of processes. Amazon Dynamo, Riak, and CouchDB use version vectors for distributed conflict detection; Git's DAG structure is equivalent to vector clocks for file history and merge conflict identification.[25]

Deadlock theory adds a practical constraint on all locking-based protocols. The 4 Coffman conditions must hold simultaneously for a deadlock to occur; breaking any one prevents it. Conservative 2PL (C2PL) prevents deadlocks by eliminating hold-and-wait — transactions must declare all read-sets and write-sets before execution begins — but this advance-knowledge requirement makes it rarely implemented in practice.[1][29] The two timestamp-based prevention protocols, Wait-Die (non-preemptive) and Wound-Wait (preemptive), both prevent deadlock and starvation by always aborting the younger transaction and restarting it with its original timestamp — ensuring the restarted transaction eventually becomes the oldest active transaction and achieves highest priority. OCC eliminates deadlocks entirely since no locks are held during execution; MVCC reduces but does not eliminate them since write-write conflicts still require locking in most implementations.[29]

The theoretical foundations of concurrency control are domain-independent. The S/X lock compatibility matrix, two-phase locking discipline, and readers-writers priority policies (first introduced by Courtois, Heymans, and Parnas in 1971) appear identically in OS file locking (IBM pioneered file locking in 1963 for mainframe OS/360), database systems (1976), and distributed key-value stores (2007+). Unix/Linux defaults to advisory locking — non-cooperating processes are not blocked; Windows uses mandatory semantics, with OS enforcement regardless of cooperation. NFS has historically poor lock support across network boundaries, forcing distributed applications to implement application-level coordination.[19][27]

Practitioner implications: For parallel code modification pipelines specifically, the theory resolves to 4 concrete choices. First, if agents operate on independent files with rare conflicts, OCC (optimistic with validation at commit) delivers maximum throughput with zero deadlock risk — the 1981 Kung-Robinson model maps cleanly onto file-level workspace isolation. Second, if agents share files or overlapping regions, MVCC semantics with SSI detection provide full serializability without blocking — the 2008 Cahill et al. algorithm is the right model, implemented cheaply via version counters per file. Third, for hierarchical project structures (repo → directory → file → region), MGL's intent-lock discipline enables O(1) conflict detection at directory level without exhaustive file enumeration. Fourth, in distributed agent pipelines where no shared clock exists, vector clocks are the minimum data structure needed to distinguish concurrent modifications (potential conflicts) from causally ordered ones (safe to chain) — Lamport scalar clocks are insufficient for this purpose. The industry survey finding — only 3 of 18 databases default to serializability — is a warning: systems that silently provide Snapshot Isolation instead will permit write-skew anomalies that feel correct but violate constraints, exactly the failure mode most damaging in multi-agent code modification where constraint invariants span multiple files.



Table of Contents

  1. Overview: The Three Families of Concurrency Control
  2. Two-Phase Locking (2PL)
  3. Multi-Version Concurrency Control (MVCC)
  4. Optimistic Concurrency Control (OCC)
  5. Timestamp Ordering (T/O) Protocol
  6. Serializability Theory
  7. Transaction Isolation Levels and Anomalies
  8. Lock Compatibility Matrices
  9. Multiple Granularity Locking and Intent Locks
  10. Vector Clocks and Happened-Before Ordering
  11. Deadlock Theory, Detection, and Prevention
  12. Serializable Snapshot Isolation (SSI)
  13. OS-Level Concurrency: Readers-Writers and File Locking

Section 1: Overview — The Three Families of Concurrency Control

Classical concurrency control theory partitions protocols into three families based on how they handle the fundamental tension between correctness and throughput. Pessimistic protocols (2PL and its variants) acquire locks before accessing data, paying upfront blocking costs to prevent conflicts. Optimistic protocols (OCC, timestamp ordering) operate freely on private copies or under timestamp discipline, paying a validation or abort cost at commit time. Multi-version protocols (MVCC) retain prior data versions so reads never block writes, at a storage overhead cost.[1][3][2]

The correctness target is serializability — the highest isolation level, defined as a schedule equivalent to some serial execution. For approximately 30 years after its 1976 introduction, Two-Phase Locking was the dominant algorithm for achieving serializability in database systems.[1][11]

Protocol Lineage and Theoretical Relationships

Key finding: Timestamp Ordering (T/O) provides the theoretical building blocks for both OCC and MVCC. OCC = T/O with all validation deferred to commit time; MVCC = T/O with multiple versions retained to allow reads to always succeed. Together with 2PL, these three form the complete foundation of classical concurrency control theory.[31]
Protocol Year Introduced Mechanism Deadlock Risk Reader/Writer Blocking Serializability Guarantee
Two-Phase Locking (2PL) 1976 (Eswaran, Gray et al.) Pessimistic locking Yes Readers block writers, writers block readers Conflict-serializable[1]
Timestamp Ordering (T/O) ~1976–1979 Timestamp-based ordering No Aborts on violations Conflict-serializable (basic), view-serializable (Thomas rule)[9]
Optimistic CC (OCC) 1979/1981 (Kung & Robinson) Private workspace + validation No Never blocks Conflict-serializable[3][22]
MVCC 1978 (Reed), 1981 (Bernstein & Goodman); 1984 first commercial (VAX Rdb/ELN, Jim Starkey, DEC)[2][12][21] Multiple data versions Reduced Readers never block writers Snapshot Isolation by default; SSI for full serializability[2][12]
Serializable SI (SSI) 2008 (Cahill et al.) MVCC + anti-dependency cycle detection No No read-write blocking Full serializability[30]
See also: Optimistic Speculative Execution (for practical OCC vs. locking recommendations), Deadlock Starvation Fairness (for deadlock algorithms in practice)

Section 2: Two-Phase Locking (2PL)

Two-Phase Locking is a pessimistic concurrency control method that guarantees conflict-serializability by dividing transaction execution into a growing phase (locks acquired only) and a shrinking phase (locks released only). Introduced in 1976 by K.P. Eswaran, J. Gray, R.A. Lorie, and I. Traiger, it remained the dominant serializability algorithm for approximately 30 years.[1][11][20]

Core Protocol Mechanics

The protocol imposes one rule: "a transaction must never acquire a lock after it has released a lock."[1] The lock point marks the moment all needed locks are acquired; after this point the transaction can only release locks.[1][11] Two fundamental lock types govern access:

2PL Protocol Variants

Variant Lock Release Policy Conflict-Serializable Deadlock-Free Recoverable Strict
Basic 2PL Locks released during shrinking phase Yes No No No
Conservative 2PL (C2PL) All locks acquired before execution begins Yes Yes Uncertain Partial
Strict 2PL (S2PL) X locks held until commit/abort; S locks released during shrinking Yes No Yes Yes
Strong Strict 2PL (SS2PL) All locks held until commit/abort — no shrinking phase in practice Yes No Yes Yes

(Sources:[11][20])

Conservative 2PL prevents deadlocks by eliminating the hold-and-wait condition — transactions must declare all read-sets and write-sets in advance. Rarely implemented in practice due to this advance-knowledge requirement.[1][11][29]

SS2PL (Rigorous 2PL) is the most widely used variant in commercial database management systems. It provides the strongest consistency guarantees while simplifying recovery — a transaction effectively only has a growing phase until completion.[1][11]

Theoretical Correctness

Key finding: Any schedule produced under the basic 2PL protocol is conflict serializable. The serialization order of transactions is determined by topological sort of the precedence graph of the schedule. This establishes 2PL as a sufficient condition for serializability — but not a necessary one: serializable schedules exist that no valid 2PL execution produces.[1][11][20]

Known Limitations

Key Foundational References


Section 3: Multi-Version Concurrency Control (MVCC)

MVCC is a non-locking concurrency control method that maintains multiple versions of data items simultaneously. Rather than overwriting data on write, MVCC creates a new version — readers access older committed versions while writers create newer ones. The fundamental property: readers never block writers; writers never block readers.[2][12][21]

Historical Development

Year Contributor / Event Significance
1978 David P. Reed (dissertation) First description of MVCC principles; claims original authorship[2][12][Reed 1978]
1981 Phil Bernstein & Nathan Goodman "Concurrency Control in Distributed Database Systems" — detailed MVCC formalization[2][12]
1984 VAX Rdb/ELN (Jim Starkey, DEC) First commercial MVCC implementation; Starkey later developed InterBase (now Firebird)[2][12][21]

Core Mechanics: Version Chains and Snapshot Isolation

The DBMS maintains a version chain per logical tuple — a linked list of versions sorted by timestamp. Each version carries:[12][21]

A transaction always reads the latest version whose WTS ≤ TS(Tᵢ) — this is snapshot isolation.[2][12][21]

Academic Classification of MVCC Methods

From Bernstein, Hadzilacos & Goodman (1987):[12]

MethodMechanism
Multiversion Timestamp Ordering (MVTO) Timestamps assigned to transactions; reads/writes directed to appropriate versions based on timestamps
Multiversion Two-Phase Locking (MV2PL) Combines version management with 2PL locking discipline
Multiversion Mixed Methods Hybrids combining timestamp ordering and locking

The Write Skew Anomaly: MVCC's Key Limitation

Most MVCC implementations provide Snapshot Isolation (SI) by default — not full serializability. The write skew anomaly exposes this gap: two concurrent transactions read overlapping data, each writes to a different part, and the combined result violates a constraint that neither write alone would violate. No write-write conflict is detected under SI; both transactions commit.[2][12] Serializable Snapshot Isolation (SSI) was developed in 2008 to close this gap.[30]

Garbage Collection Challenge

Obsolete versions accumulate and must be reclaimed. A version is reclaimable when no active transaction can see it, or when created by an aborted transaction.[2][12][21] Key GC strategies:

A side benefit: without GC, MVCC supports time-travel queries — querying the database as of a past snapshot.[2][12][21]

MVCC vs. 2PL Comparison

Property MVCC 2PL
Reader-writer conflict None — readers access old versions S locks block X lock acquisition
Deadlock risk Reduced (readers hold no locks) Yes — requires detection/prevention
Storage overhead High — multiple versions per object Low — single version
Default serializability Snapshot Isolation only Full conflict-serializability (with SS2PL)
GC complexity Significant operational concern Not applicable
Read-heavy workloads Excellent Moderate — S locks allow parallel reads

(Sources:[2][21])

Real-World Implementations

SystemMVCC Approach
PostgreSQLNative MVCC; VACUUM for GC; SSI since v9.1[12]
MySQL InnoDBMVCC via undo logs[12]
OracleMulti-version reads via undo segments; "SERIALIZABLE" mode is actually SI[12][26]
Google SpannerMVTO + external certification via TrueTime for strict serializability[12]
Microsoft Hekaton, SAP HANA, MemSQL, NuoDBIn-memory MVCC implementations[12]
Key finding: Many modern database systems combine MVCC for reads with locking for write-write conflicts — gaining the non-blocking read properties of MVCC while retaining the write-ordering guarantees of 2PL.[21]

Section 4: Optimistic Concurrency Control (OCC)

OCC assumes transactions complete without conflict most of the time. Rather than acquiring locks, transactions operate freely on private data copies, then validate at commit time that no conflicts occurred. The approach is "optimistic" in that it relies primarily on transaction rollback as a control mechanism, "hoping" conflicts will not occur.[3][13][22]

Historical Development

The Four Phases of OCC

PhaseActionLocking
1. Begin Record a timestamp marking transaction start None
2. Read/Modify Read DB values; write to private workspace (local copies only — no global writes). On first write to an object, copy it; all subsequent writes target the copy. None
3. Validate Check whether other transactions have modified data this transaction read or wrote. Transaction number (tn) is assigned at the end of the read phase, not the beginning. Critical section only
4. Commit / Rollback If validation succeeds, publish changes. If conflict detected, abort and restart. Brief write lock

(Sources:[3][13][22])

Kung & Robinson Serializability Conditions

For transaction Tᵢ to precede Tⱼ in the equivalent serial order, at least one of the following must hold:[3]

  1. Tᵢ completes all writes before Tⱼ starts reading — zero overlap
  2. write_set(Tᵢ) ∩ read_set(Tⱼ) = ∅, AND Tᵢ completes writes before Tⱼ starts writes
  3. write_set(Tᵢ) ∩ (read_set(Tⱼ) ∪ write_set(Tⱼ)) = ∅, AND Tᵢ completes reads before Tⱼ completes reads

Validation Techniques

Method Conditions Checked Critical Section Scope Best For
Backward Validation (serial) Conditions 1 and 2 — looks back at already-committed transactions Wraps entire validate + write sequence Simpler implementation; less scalable for disk-based DBs[3][22]
Forward Validation (parallel) All three conditions — looks ahead at currently active transactions Wraps only the step to identify concurrent transactions Higher concurrency environments; more scalable[3][22]

Performance Characteristics

Low-contention environments: OCC excels — transactions avoid lock management overhead and eliminate waiting periods, producing superior throughput compared to locking approaches.[3][13]

High-contention environments: Performance degrades severely. Per Kung & Robinson: "repeatedly restarting transactions hurts performance significantly." Thrashing occurs when the same transactions repeatedly conflict and abort. Starvation is possible — a transaction with a long read phase must check write sets of all transactions that committed while it was active, potentially requiring unbounded buffer space.[3][22]

Notable Implementations

Google App Engine Datastore, Apache Solr (via _version_ field), Elasticsearch (document updates), CouchDB (document revisions via _rev fields), MonetDB, Software Transactional Memory (STM) systems, DynamoDB and MongoDB (conditional updates), Mimer SQL, Rails and Grails web frameworks.[13][22]

In web applications, HTTP's stateless architecture makes traditional locking impractical. OCC is implemented via ETags in HTTP headers, hidden form fields with checksums/tokens, and version numbers in database records.[3]

Key finding: OCC eliminates deadlocks entirely (no locks held during execution) and avoids all blocking, but at the cost of potential transaction restarts. The 1981 Kung & Robinson paper spawned Snapshot Isolation, Serializable Snapshot Isolation (SSI), and Software Transactional Memory (STM) — the three most important descendants in modern concurrency control.[3][22]

Section 5: Timestamp Ordering (T/O) Protocol

Timestamp Ordering is a concurrency control protocol where the DBMS uses timestamps — rather than locks — to determine the serialization order of transactions. Each transaction Tᵢ receives a unique, monotonically increasing timestamp TS(Tᵢ). Classified as optimistic in that conflicts are assumed rare, but unlike OCC, conflicts trigger immediate abort rather than deferred validation.[9][31]

Per-Data-Item Timestamps and Protocol Rules

Each data item Q maintains two timestamps:[9][31]

Operation Condition Action Rationale
Read(Q) TS(T) < WTS(Q) ABORT T T tries to read a value already overwritten by a later transaction (physically unrealizable)
Read(Q) TS(T) ≥ WTS(Q) ALLOW READ, set RTS(Q) = max(RTS(Q), TS(T)) Read is temporally consistent
Write(Q) TS(T) < RTS(Q) ABORT T T tries to write a value a later transaction already read
Write(Q) TS(T) < WTS(Q) ABORT T T tries to write an outdated value — already overwritten
Write(Q) Otherwise ALLOW WRITE, set WTS(Q) = TS(T) Write is temporally consistent

(Sources:[9][31])

Important limitation: Every read requires writing a new RTS(Q) value — every read becomes a write operation, creating significant overhead.[9][31]

The Thomas Write Rule (1979)

Robert H. Thomas (1979) — "A majority consensus approach to concurrency control for multiple copy databases," ACM TODS 4(2): 180–209 — proposed ignoring outdated writes rather than aborting them:[9]

Situation (Write Q) Basic T/O Thomas Write Rule
TS(T) < WTS(Q) Abort T Ignore the write (skip, continue) — the outdated write is irrelevant
TS(T) < RTS(Q) Abort T Abort T (unchanged)
Otherwise Allow write Allow write
Serializability type Conflict-serializable View-serializable only
Concurrency Lower (unnecessary aborts) Higher (fewer aborts)

The Thomas Write Rule trades conflict-serializability for view-serializability, gaining concurrency at the cost of a harder correctness check — determining view-serializability is NP-complete.[9]

Strict Timestamp Ordering

To avoid cascading rollbacks, the strict variant delays (rather than aborts) operations that would violate timestamp order, executing them once all conflicting operations of older transactions have committed or aborted.[31]

T/O vs. 2PL Comparison

Property 2PL T/O
MechanismLock acquisitionTimestamp comparison
DeadlockPossible — detection/prevention neededImpossible — no locks held
StarvationLess likelyLong-running transactions may starve (repeatedly aborted by newer ones)
OverheadLock table managementTimestamp tracking; every read writes a new RTS
Cascading rollbacksIn basic 2PLPossible in basic T/O
Aborted transactions restart withSame identityNew (higher) timestamp — the restarted transaction is now serialized after the transaction that caused the abort, so it can read the most recently committed version without a timestamp violation.[9]
Key finding: T/O provides the theoretical bridge between locking-based and version-based concurrency. MVCC is a multiversion extension of T/O that eliminates most read violations by allowing reads to always find an appropriate older version — transforming aborts into version selects.[9][31]

Section 6: Serializability Theory

Serializability is the fundamental correctness criterion for concurrent database transactions: a schedule is serializable if it is equivalent to some serial schedule — i.e., it produces the same result as if transactions executed one at a time. It is the highest level of isolation between transactions.[4][14][23] A schedule (or history) is the ordered list of operations from multiple transactions: reads (r₁[x]), writes (w₁[x]), commits (c₁), and aborts (a₁).[4]

Conflict Classification

Two operations are in conflict if and only if: (1) they belong to different transactions, (2) at least one is a write, and (3) they access the same object. Conflicting operations are non-commutative — swapping their order changes the outcome.[4][14]

Conflict TypeOperationsRisk
Read-Write (RW)r₁[x], w₂[x]T2 overwrites value T1 read — non-repeatable read[4]
Write-Read (WR)w₁[x], r₂[x]T2 reads value written by T1 before T1 commits — dirty read[4]
Write-Write (WW)w₁[x], w₂[x]T2 overwrites T1's uncommitted write — lost update[14]

Conflict Serializability and the Precedence Graph

Two schedules are conflict-equivalent if they maintain identical transaction actions and identical ordering of every conflicting operation pair. A schedule is conflict-serializable when its precedence graph is acyclic.[4][14][23]

Precedence graph construction: one node per transaction; directed edge Tᵢ → Tⱼ if Tᵢ has a conflicting operation that precedes one in Tⱼ. A topological sort on the acyclic graph yields the equivalent serial schedule. This is the foundational theorem backing both 2PL correctness proofs and SSI cycle detection.[4][14]

View Serializability

A weaker equivalence that requires: (1) identical initial value reads, (2) preserved read-from relationships, and (3) matching final write assignments. All conflict-serializable schedules are view-serializable; the converse is false — view-serializable schedules with blind writes exist that are not conflict-serializable. Determining view-serializability is NP-complete, which is why practical systems target conflict-serializability exclusively.[4][14][23]

Schedule Containment Hierarchy

DimensionContainment Order (subset to superset)
Serializability Serial ⊂ Conflict-serializable ⊂ View-serializable ⊂ All schedules[4][14]
Recoverability Serial ⊂ Strict ⊂ Cascadeless (ACA) ⊂ Recoverable ⊂ All schedules[4][14]

Recovery Classifications

ClassDefinitionGuarantees
Recoverable Transactions commit only after all transactions whose changes they read have committed No orphaned data dependencies[4]
Cascadeless (ACA) Transactions never read data from uncommitted writes (no dirty reads) Prevents cascading rollbacks[4][14]
Strict A write operation's conflicting operation cannot proceed until the write's transaction commits/aborts Efficient failure recovery; SS2PL produces strict schedules[4]

Methods for Enforcing Conflict Serializability

  1. Two-Phase Locking (2PL) — proven sufficient condition; lock compatibility enforces conflict ordering[14]
  2. Timestamp Ordering (TO) — orders operations by transaction timestamp; aborts violations[14]
  3. Serializable Snapshot Isolation (SSI) — MVCC-based; detects dangerous structures in the dependency graph[14]
  4. Abort cycles — abort any transaction forming a cycle in the precedence graph[14]

Strict Serializability in Distributed Systems

Even combining standard serializability and linearizability can still result in unexpected anomalies such as causal reversal. A stale read — failing to see effects of a completed write — is common in distributed systems. Strict serializability adds real-time ordering requirements to ensure transactions respect wall-clock ordering, eliminating stale reads.[23]

Industry Reality on Serializability Adoption

Key finding: In a survey of 18 SQL and "NewSQL" databases, only 3 of 18 provided serializability by default, and 8 — including Oracle and SAP — did not offer full serializability at all. Oracle's "SERIALIZABLE" isolation mode actually provides Snapshot Isolation, not true serializability.[4][14][23]
Primary source: Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J.M., and Stoica, I. (2013). Highly Available Transactions: Virtues and Limitations. Proceedings of the VLDB Endowment 7(3):181–192.

Section 7: Transaction Isolation Levels and Anomalies

Isolation is the "I" in ACID. The four ANSI/ISO SQL standard isolation levels trade concurrency for correctness: lower isolation yields higher concurrency and more anomalies; higher isolation yields fewer anomalies with greater resource consumption.[8][17]

ANSI SQL Anomaly Definitions (Formal Notation from Berenson et al. 1995)

AnomalyFormal PatternDescription
A1 Dirty Read w1[x] … r2[x] … (a1 and c2 in any order) T2 reads T1's uncommitted write; if T1 rolls back, T2 has read data that never existed[8]
A2 Fuzzy Read r1[x] … w2[x] … c2 … r1[x] … c1 T1 reads x twice; T2 modifies and commits between the reads — T1 gets different values[8]
A3 Phantom Read r1[P] … w2[y in P] … c2 … r1[P] … c1 T1 queries by predicate P; T2 inserts/deletes rows matching P and commits; T1's re-query returns different rows[8]

Anomaly Prevention by Isolation Level

Isolation Level Dirty Read Non-Repeatable Read Phantom Read Write Skew
READ UNCOMMITTEDPossiblePossiblePossiblePossible
READ COMMITTEDPreventedPossiblePossiblePossible
CURSOR STABILITY
Prevents P4 (lost update) for cursor operations[Berenson 1995]
PreventedPossiblePossiblePossible
REPEATABLE READPreventedPreventedPossiblePossible
SNAPSHOT ISOLATIONPreventedPreventedPreventedPossible
SERIALIZABLEPreventedPreventedPreventedPrevented

(Sources:[8][14][17][23][26])

Locking Implementation of Isolation Levels

LevelRead LocksWrite LocksDefault In
READ UNCOMMITTED None Long (held until commit)
READ COMMITTED Short (released after read) Long PostgreSQL, Oracle, SQL Server[8][17]
REPEATABLE READ Long on accessed rows; no range/predicate locks Long MySQL InnoDB[8][17]
SERIALIZABLE Long + predicate/range locks (prevents phantoms) Long

Additional Anomalies Beyond the ANSI Standard

Berenson et al. 1995 (ACM SIGMOD) identified phenomena missing from the original ANSI specification:[26][8][17]

AnomalyCodeDescription
Dirty Write P0 T overwrites a value written by another still-in-flight transaction. Can violate database consistency. Not in original ANSI standard.[26]
Lost Update P4 T1 reads x=10, T2 reads x=10, T1 writes x=20, T2 writes x=15 (not 25 as intended) — T1's update is lost.[8][17][26]
Read Skew A5A T1 reads x and y; T2 changes both between the two reads — T1 sees an inconsistent partial update.[17][26]
Write Skew A5B Two concurrent transactions read overlapping data, write to different parts; combined result violates a constraint neither write alone would violate. Classic example: two on-call doctors each conclude a colleague is on-call and go off-call simultaneously — violating the "at least one on-call" constraint.[8][17][26]

The Berenson et al. 1995 Critique

Authors: Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil. Published ACM SIGMOD 1995. Key findings:[8][17][26][Berenson 1995]

  1. ANSI definitions are ambiguous — strict and broad interpretations both have weaknesses
  2. ANSI definitions are incomplete — miss Dirty Write (P0) entirely
  3. Definitions "rely in subtle ways on an assumption that a locking schema is used" — rendering semantics ill-defined for OCC or MVCC systems
  4. Even under strictest interpretation, ANSI anomalies don't cover all isolation failures
  5. Snapshot Isolation is not captured by any ANSI level
  6. Oracle's "SERIALIZABLE" mode actually provides Snapshot Isolation

The paper formalized Cursor Stability (between READ COMMITTED and REPEATABLE READ; prevents lost update for cursor operations) and Snapshot Isolation as explicit levels.[17][26]

Conflict Isolation vs. Data Isolation

Two distinct theoretical frameworks define what it means for a concurrency control protocol to be "correct," and they are not equivalent. Conflict-based isolation, exemplified by 2PL, prevents conflicting operations (read-write, write-read, write-write pairs on the same object from different transactions) from interleaving. A schedule is correct under this framework if and only if its precedence graph is acyclic — i.e., it is conflict-serializable.[4][14] Data-based (anomaly-based) isolation, the framework of the ANSI SQL standard and its critique by Berenson et al. 1995, defines correctness by the absence of specific observable anomalies: dirty reads (A1), fuzzy reads (A2), phantoms (A3), and the additional phenomena (P0, P4, A5A, A5B) not captured by the original ANSI definitions.[8][26]

The two frameworks are not equivalent. Conflict-serializability implies anomaly-freedom: any conflict-serializable schedule is free of all classical ANSI anomalies. But anomaly-freedom does not imply conflict-serializability. The canonical example is Snapshot Isolation: SI prevents dirty reads (A1), fuzzy reads (A2), and phantoms (A3) — satisfying the ANSI SERIALIZABLE anomaly criteria in their strict interpretation — yet SI is not conflict-serializable, because it permits write skew (A5B). Two concurrent transactions can each read overlapping data, write to disjoint parts, and both commit under SI without any write-write conflict being detected; the resulting schedule cannot be produced by any serial execution. This is precisely the gap that motivated Berenson et al.’s 1995 critique — the ANSI definitions, relying implicitly on locking semantics, fail to capture SI as a distinct isolation level — and the subsequent development of Serializable Snapshot Isolation (SSI) in 2008 to restore full conflict-serializability without sacrificing SI’s non-blocking read properties.[8][17][26][30]

Gray (1976) Degrees of Consistency — Historical Precursor

Gray et al. 1976 introduced "degrees of consistency," the precursor to SQL isolation levels:[7][18][24]

DegreeInformal DefinitionLocking ProtocolSQL Equivalent
0No dirty writes (may write uncommitted data of others)Short-duration X locks onlyBelow READ UNCOMMITTED
1No uncommitted writes visible to othersLong-duration X locksREAD UNCOMMITTED
2No dirty reads (read committed)Long X locks + short read locksREAD COMMITTED
3Repeatable reads (full serializability)Long X locks + long read locksSERIALIZABLE
Key finding: Snapshot Isolation is strictly stronger than READ COMMITTED and incomparable to REPEATABLE READ — it prevents read skew (which REPEATABLE READ at the row level does not), but permits write skew (which SERIALIZABLE prevents). This incomparability was the central motivation for Berenson et al.'s critique.[8][17][26]

Section 8: Lock Compatibility Matrices

Lock compatibility matrices formalize which lock types can be held simultaneously by different transactions. The matrix directly enforces conflict-serializability: incompatible lock requests block the requesting transaction, preventing out-of-order conflicting operations.[1][28]

Basic S/X Lock Compatibility

Requested \ HeldS (Shared)X (Exclusive)
S (Shared)Compatible ✓Incompatible ✗
X (Exclusive)Incompatible ✗Incompatible ✗

(Sources:[1][11][28])

Lock acquisition rules (Berkeley CS186 formulation):[28]

  1. Transactions must acquire S lock before reading
  2. Transactions must acquire X lock before writing
  3. Transactions cannot acquire new locks after releasing any lock — this is the 2PL constraint that enforces serializability

If a requested lock is incompatible with a held lock, the requesting transaction waits until all incompatible locks are released.[28]

Lock Upgrades and Downgrades

Upgrading: Conversion from S to X lock. The Lock Manager places upgrade requests at the front of the queue — ahead of new S lock requests — to reduce starvation risk for upgrading transactions.[28]

Downgrading: Conversion from X to S lock; allowed during the shrinking phase.[28]

Connection Between Lock Compatibility and Conflict Serializability

The three conflict types map directly to incompatibilities in the matrix:[28]

Full Six-Mode Compatibility Matrix (with Intention Locks)

For hierarchical resource locking, six lock modes are required:[6][16][28]

Held \ RequestedNLISIXSSIXX
NL
IS
IX
S
SIX
X

(Sources:[6][16][28])

Key observations from Gray 1976:[7][18]

Some sources (raw_24) present a 5-mode matrix omitting NL, treating "no lock" as implicit rather than a named mode. Both representations are equivalent.[24]

Key finding: The lock compatibility matrix is the algorithmic heart of 2PL: the three conflict types (RW, WR, WW) that define conflict-serializability map one-to-one onto incompatibilities in the S/X matrix, making the matrix + 2PL protocol a complete mechanical enforcement of conflict-serializable schedules.[28]

Section 9: Multiple Granularity Locking and Intent Locks

Multiple Granularity Locking (MGL) allows transactions to lock data at different levels of a resource hierarchy — from fine-grained individual rows to coarse-grained entire tables. Without MGL, locking a table exclusively would require scanning all rows for existing locks: O(n) per check. With intent locks on the parent node, conflict detection becomes O(1).[6][16][24]

Foundational Papers

The Core Tension in Lock Granularity

GranularityConcurrencyLocking OverheadUse Case
Fine-grained (individual rows)HighHighOLTP with many small transactions[6]
Coarse-grained (entire table)LowLowBatch operations, DDL[6]
Mixed (MGL)AdaptiveAdaptiveGeneral-purpose DBMSes[16]

Database resource hierarchy: Database → Files/Tables → Pages → Records/Rows → Fields[6][16]

The Six Lock Modes

ModeNameMeaning
NLNull LockNo lock held; compatible with everything[6][16]
ISIntention SharedIntent to acquire S locks on descendants; allows others to read the node[6][7]
IXIntention ExclusiveIntent to acquire X locks on descendants; blocks S locks on the whole subtree[7][16]
SSharedRead access to entire subtree; no exclusive locks can exist anywhere in subtree[7][18]
SIXShared + Intention ExclusiveRead entire subtree; intent to write specific lower nodes — "reads all, writes some"[6][7]
XExclusiveExclusive write lock on node and entire subtree; incompatible with all other locks[7][18]

Lock Acquisition and Release Protocol

Acquisition direction: Top-down (root to leaf)
Release direction: Bottom-up (leaf to root) — a node can only be unlocked after all its children are unlocked.[6][7][16][18][24]

Lock Desired on NodeRequired Parent Condition
S or ISAt least one parent holds IS or IX[6][7]
X, IX, or SIXAt least one parent holds IX or SIX[6][7]

For DAGs (nodes with multiple parents): before requesting IX, SIX, or X, all immediate parent nodes must hold IX or greater — not just one.[24]

Correctness and Lock Escalation

Gray et al. prove that if two lock graphs are compatible, the implicit locks on the leaves are also compatible — the intent-lock hierarchy is provably equivalent to naive leaf-level locking without the O(n) overhead.[24] If too many fine-grained locks accumulate, the DBMS can escalate to a coarser granularity (e.g., row locks → table lock) to reduce overhead.[24]

Real-World Implementations

SystemLock Granularity Implementation
IBM DB2Row-level, page-level, table-level with IS/IX/SIX[6][24]
OracleTX (transaction) and TM (table) locks with row-level granularity[6][24]
SQL ServerRow, page, extent, table, and database locks[6][24]
MySQL InnoDBRow-level locks with table-level intent locks[6][24]
PostgreSQLIntent locks for table/row-level locking[6][24]
Key finding: The fundamental insight of Multiple Granularity Locking — "announce intentions at coarsest granularity, execute at finest" — reduces coarse-grained conflict detection from O(n) leaf scans to O(1) parent-node checks, making scalable fine-grained locking practical in production database systems.[6][16]
See also: Monorepo Tooling (for practical file-level locking implementations)

Section 10: Vector Clocks and Happened-Before Ordering

In distributed systems with no global clock, ordering events across nodes requires logical time. The foundational insight from Lamport (1978): "It is not important that all processes agree on what the actual time is, but that they agree on the order in which events occur."[5][15][25]

The Happened-Before Relation (Lamport 1978)

Formalized by Leslie Lamport in "Time, Clocks, and the Ordering of Events in a Distributed System" (CACM 21(7), 1978). The happens-before relation → is defined by:[5][15][25]

  1. If a and b are events in the same process and a precedes b, then a → b
  2. If a is the sending of a message and b is its receipt, then a → b
  3. Transitivity: a → b and b → c implies a → c
  4. If neither a → b nor b → a, then a and b are concurrent (denoted a ↔ b or a ∥ b)

Lamport Clocks: Scalar Logical Time

A numerical counter per process; rules:[5][15][25]

  1. Increment local counter before each event
  2. When sending a message, attach current counter value
  3. When receiving a message with timestamp t: set local counter to max(local, t) + 1

Key property: If a → b then C(a) < C(b).
Critical limitation: The converse does NOT hold. If C(a) < C(b), we CANNOT conclude a → b. Lamport clocks cannot detect concurrency — only enforce known ordering.[5][15][25]

Vector Clocks: Complete Causal Tracking

A vector clock is an array of N logical clocks — one per process in an N-process system. Process Pᵢ maintains vector VCᵢ where VCᵢ[i] counts Pᵢ's own events and VCᵢ[j] records what Pᵢ knows about Pⱼ's event count.[5][15][25]

Algorithm:

  1. Initialize all clocks to [0, 0, …, 0]
  2. Internal events: Pᵢ increments VCᵢ[i] before executing
  3. Send message: Pᵢ increments VCᵢ[i]; attaches current VCᵢ as timestamp ts(m)
  4. Receive message: Pⱼ sets VCⱼ[k] ← max(VCⱼ[k], ts(m)[k]) for each k, then increments VCⱼ[j]

Causality Detection via Vector Comparison

RelationshipConditionInterpretation
VC1 = VC2All elements equalSame logical instant
VC1 < VC2 (VC1 happened-before VC2)All elements of VC1 ≤ corresponding VC2 elements, at least one strictly lessVC1 causally precedes VC2
VC1 > VC2Reverse of aboveVC2 causally precedes VC1
Concurrent (VC1 ∥ VC2)Neither VC1 ≤ VC2 nor VC2 ≤ VC1No causal relationship — treat as conflict

Example: [2,3,1] vs [3,0,0] — first slot of second is greater (3>2) but second slot of first is greater (3>0). Neither dominates → concurrent events, flag as potential conflict.[5][15]

The complete bidirectional property: VC(a) < VC(b) if and only if a happened-before b. This is what Lamport clocks cannot provide — vector clocks enable true conflict detection, not just ordering.[5][15]

Lamport vs. Vector Clocks Comparison

Feature Lamport Clocks Vector Clocks
StructureSingle integer counterArray of N integers
Space complexityO(1) per processO(n) per process
Causal orderingPartial (one-directional implication)Full (bidirectional iff)
Concurrency detectionNoYes
Use casesTotal ordering, log sequencing, leader electionConflict detection, distributed MVCC, CRDTs

(Sources:[5][15][25])

Historical Development

Applications in Concurrency Control

Relationship to Concurrency Control Theory

Happened-before provides the formal foundation for detecting write-write conflicts in distributed systems. The partial order defined by happened-before maps directly to the serializability graph concept. In MVCC distributed databases, each write carries its vector; when replicas exchange versions, if neither vector dominates the other, the writes were concurrent — the store keeps both as siblings for application-level merging.[25]

Scaling Concerns and Variants

Vector clock size grows linearly with the number of processes — a real overhead in large, dynamic clusters.[5][15] Variants addressing this:

VariantApproach
Dotted Version VectorsMore space-efficient; address anomalies in standard vector clocks[25]
Version VectorsSimplified form for key-value stores[15]
Hybrid Logical Clocks (HLC)Combine physical and logical time[25]
Matrix ClocksTrack knowledge of others' knowledge[15]
Plausible ClocksReduce storage overhead with approximations[15]
Interval Tree ClocksSupport dynamic process creation/deletion[25]
Bloom ClocksProbabilistic compact representation[15]
Key finding: Vector clocks are the minimum logical data structure that enables complete bidirectional causality detection in distributed systems. Lamport clocks can establish a total order but cannot distinguish causal precedence from concurrence — the distinction that matters for conflict detection in parallel code modification, distributed databases, and CRDT-based collaborative editing.[5][15]

Section 11: Deadlock Theory, Detection, and Prevention

A deadlock occurs when two or more transactions each hold resources the other needs, creating a standstill where none can proceed. Classic example: Transaction A holds Resource 1 and waits for Resource 2; Transaction B holds Resource 2 and waits for Resource 1.[10][29]

The Four Necessary Conditions (Coffman Conditions)

A deadlock can only occur when all four conditions hold simultaneously — breaking any single condition prevents deadlock:[10][29]

  1. Mutual Exclusion: Only one transaction can hold a particular resource at a time
  2. Hold and Wait: Transactions holding resources may request additional resources held by others
  3. No Preemption: Resources cannot be forcibly taken from the holding transaction
  4. Circular Wait: A cycle of transactions exists where each waits for a resource held by the next

Deadlock Detection: Wait-For Graph (WFG)

The wait-for graph is a directed graph where nodes are active transactions and an edge Tᵢ → Tⱼ means Tᵢ is waiting for a lock held by Tⱼ. A cycle in the graph indicates a deadlock.[10][29]

Example cycle: T1 waits for T3's lock on X; T3 waits for T2's lock on Y; T2 waits for T1's lock on Z → deadlock.[10]

Limitation: Suitable only for systems where transactions are lightweight with few resource instances. In distributed systems, deadlocks span multiple nodes and require a global wait-for graph with a central coordinator — or distributed detection algorithms.[10]

Deadlock Resolution After Detection

Victim selection criteria: minimum work done, fewest resources held, fewest remaining operations, lowest rollback cost.[29] Recovery options:

Timestamp-Based Deadlock Prevention Protocols

Both Wait-Die and Wound-Wait assign priorities based on transaction timestamps — older transaction = lower timestamp = higher priority. In both protocols, the younger transaction is always aborted.[10][29]

Feature Wait-Die (Non-Preemptive) Wound-Wait (Preemptive)
TypeNon-preemptivePreemptive
Older requests Younger's resource Older waits Younger is aborted ("wounded"); Older proceeds
Younger requests Older's resource Younger dies (aborts) Younger waits
Who is aborted?Always younger (the requestor)Always younger (the holder)
Holds resource?Older never preempted from a resourceOlder preempts resources from younger

(Sources:[10][29])

Starvation prevention: Rolled-back transactions restart with the same timestamp — not a new one. This means they eventually become the oldest active transaction and receive highest priority, ensuring progress.[29]

No-Wait Scheme: If a transaction cannot immediately acquire all needed locks, it aborts and retries — no waiting. Eliminates deadlock by eliminating the hold-and-wait condition entirely.[10]

Conservative 2PL as Deadlock Prevention

Conservative 2PL prevents deadlocks by requiring all locks to be acquired before execution begins. If a transaction cannot acquire all needed locks, it waits without holding any — eliminating hold-and-wait. Requires advance declaration of all data access patterns, making it impractical for most database workloads.[10][29]

Deadlock Avoidance

Deadlock avoidance tests each proposed lock acquisition before granting it. If granting the lock could lead to a future deadlock state, the request is denied — the transaction must wait or abort without holding any conflicting resources.[10] This strategy requires advance knowledge of each transaction’s future resource needs, making it impractical for most database workloads. In practice, detection is generally preferred: allow deadlocks to occur, then detect and resolve them reactively via the wait-for graph.

Distributed Deadlock Handling

MethodApproachTrade-off
Edge-chasing algorithms Each node propagates probe messages when detecting potential deadlock Distributed, no single point of failure[10]
Chandy-Misra-Haas Classic distributed deadlock detection algorithm Well-proven; message overhead[10]
Timeout-based Assume long wait = deadlock; abort Simple; false positives under slow transactions[10]
Global wait-for graph Central coordinator collects local WFGs, checks for global cycles Complete detection; single point of failure[10]

Deadlock, Livelock, and Starvation Distinguished

ConditionDefinitionProgress
DeadlockProcesses blocked, waiting for each other's resourcesNone — permanent standstill[10]
LivelockProcesses keep executing but make no progress (e.g., continuously back off and retry simultaneously)None — OCC thrashing is a common example[10][29]
StarvationA process is repeatedly denied resource access due to priority/schedulingIndefinitely delayed[10][29]

Relationship to Concurrency Control Methods

Key finding: Both Wait-Die and Wound-Wait prevent both deadlock and starvation — the same timestamp reuse on restart ensures eventual progress. The key difference is whether the older or younger transaction plays the active role: in Wait-Die, old waits and young dies; in Wound-Wait, old wounds (preempts) young, and young waits.[10][29]
See also: Deadlock Starvation Fairness (for deadlock algorithms in practice, including distributed deadlock detection implementations)

Section 12: Serializable Snapshot Isolation (SSI)

Serializable Snapshot Isolation closes the gap between Snapshot Isolation and full serializability without introducing reader-writer blocking. It was proposed in the landmark paper: Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. "Serializable isolation for snapshot databases." SIGMOD '08, pp. 729–738.[30][Cahill 2008]

The Write Skew Problem That Motivated SSI

Until Cahill et al. 2008, the only ways to prevent write skew under Snapshot Isolation were: (1) modify applications with artificial locking, (2) introduce fake update conflicts, or (3) carefully analyze all transaction pairs — all requiring deep application-level knowledge and fragile maintenance.[30]

Classic Write Skew Example (Cahill et al. 2008): Two bank balances V1 and V2 with constraint V1 + V2 ≥ 0. Both start at $100:[30]

Why 2PL prevents this: Under 2PL, T1 holds S locks on V1 and V2; T2's attempt to acquire X lock on V2 is blocked by T1's S lock — transactions must serialize. Under SI without SSI, both transactions read from independent snapshots (no locks held), both proceed, both commit.[30]

The SSI Detection Mechanism

SSI detects "dangerous structures" in the MVCC serialization graph:[30]

The problem is reduced from detecting entire cycles (expensive global graph traversal) to detecting specific two-arc structures (cheap per-transaction tracking). SSI is an optimistic technique: transactions continue in hopes everything will be fine, aborting only when a dangerous structure is detected.[30]

SSI vs. 2PL

Property 2PL (SS2PL) SSI
Reader blocking writersYes (S lock conflicts with X lock request)No
Writer blocking readersYes (X lock conflicts with S lock request)No
Deadlock possibleYesNo
False abort rateZero — aborts only for actual conflictsLow — some false positives (conservative detection)
Age1976 (Eswaran et al.)2008 (Cahill et al.); PostgreSQL 9.1 in 2011
Concurrency under no conflictFull (no unnecessary blocking)Full (no blocking at all)
Concurrency under conflictBlocks until conflict resolvesAborts one transaction in dangerous structure

(Sources:[30])

PostgreSQL SSI Implementation (v9.1, 2011)

Based on Dan Ports (VLDB 2012) extending Cahill et al., PostgreSQL's implementation added:[30]

Limitations of SSI

Formal Foundation

Snapshot Isolation was first formally defined in Berenson et al. 1995. SSI builds directly on this foundation — taking SI's favorable read-write non-blocking properties and MVCC storage model, then adding the minimum detection machinery required for full serializability guarantees.[30]

Key finding: SSI is the first practical algorithm to achieve full serializability without introducing any read-write lock conflicts. It is categorically superior to 2PL on read-write contention (neither direction blocks) while being essentially equivalent in the absence of conflicts — making it the theoretically optimal isolation mechanism for mixed workloads as of 2011.[30]

Section 13: OS-Level Concurrency — Readers-Writers Problem and File Locking

The theoretical foundations of database concurrency control are mirrored at the operating system level, where the same fundamental correctness problems appear in file access coordination.[19][27]

The Readers-Writers Problem

Formalized by Courtois, Heymans, and Parnas in 1971, "Concurrent Control with 'Readers' and 'Writers'" (CACM). The constraint: multiple readers may access a shared resource simultaneously (reads are non-destructive), but no thread may access while a writer is active.[19]

This maps directly to the S/X lock distinction: Readers ↔ Shared (S) locks; Writers ↔ Exclusive (X) locks.[19]

Priority Policies

PolicyProblem NameBehaviorStarvation Risk
Read-Preferring First Readers-Writers Problem New readers always join while a reader is active Writers — may wait indefinitely under high read load[19]
Write-Preferring Second Readers-Writers Problem New readers blocked if a writer is waiting Readers — under heavy write load; generally preferred when data freshness is critical[19]
Fair Third Readers-Writers Problem No thread allowed to starve; bounded wait time for all None — more complex to implement[19]

Advanced OS Solutions

Read-Copy-Update (RCU): Wait-free for readers. Writers copy data, modify the copy, atomically swap the pointer. Readers see either old or new version but never partial state. Used extensively in the Linux kernel.[19][27]

Seqlock (Linux kernel): Optimized for few writers. Readers note a sequence counter before and after reading; if the counter changed, a write occurred and the read is retried (OCC-style validation). Writers increment counter: odd = writing in progress, even = complete.[19]

File Locking: Historical Origins and OS Primitives

IBM pioneered file locking in 1963 for mainframe OS/360 ("exclusive control"). The concept became universal across all major operating systems.[27]

PrimitivePlatformSemanticsGranularity
flock()Unix/LinuxAdvisory; entire filesFile[27]
fcntl()Unix/LinuxAdvisory; byte-range lockingByte range[27]
lockf()POSIX standardAdvisoryByte range[27]
LockFile / LockFileExWindowsMandatory; OS enforces automaticallyByte range[27]
SpinlocksAll platformsBusy-wait; fast in low-contentionIn-memory[27]
SemaphoresAll platformsInteger-regulated access count via wait()/signal() atomicsLogical resource[27]
MutexesAll platformsBinary lock; single-threaded write accessObject/region[27]

Advisory vs. Mandatory locking: Unix/Linux defaults to advisory locking — processes voluntarily acquire and check locks; non-cooperating processes are not blocked. Windows uses mandatory semantics — the OS enforces locking automatically, blocking conflicting accesses regardless of cooperation.[27]

File System Granularity

A "lock" in file systems may be associated with one or more data blocks, files, directories, volumes, or disk drives. Most OSes support record locking: individual records within a file may be locked independently, enabling higher concurrency for update-intensive file applications.[27]

2PL in the OS Scheduler Context

2PL is applied at the OS level: the scheduler acquires all necessary locks in the growing phase and releases them in the shrinking phase — checking each operation for conflicts with existing locks before granting or delaying.[27]

Theoretical Unification

File system locking implements the same fundamental concepts as database concurrency control:[27]

NFS portability: NFS (Network File System) has historically poor lock support across network boundaries — a significant operational difference from local file system locking that forces distributed applications to implement application-level coordination.[27]

Key finding: The theoretical foundations of concurrency control are domain-independent: the S/X lock compatibility matrix, two-phase locking discipline, and readers-writers priority policies appear identically in OS file locking (1963), database systems (1976), and distributed key-value stores (2007+). Only the granularity, advisory/mandatory semantics, and network-transparency requirements differ across domains.[19][27]
See also: Monorepo Tooling (for file-level locking in build systems and parallel code modification), Optimistic Speculative Execution (for practical recommendations on OCC vs. locking in code modification pipelines)

References

  1. Eswaran, K.P., Gray, J.N., Lorie, R.A., and Traiger, I.L. (1976). The notions of consistency and predicate locks in a database system. Communications of the ACM 19(11):624–633. doi:10.1145/360363.360369
  2. Gray, J.N., Lorie, R.A., Putzolu, G.R., and Traiger, I.L. (1976). Granularity of locks and degrees of consistency in a shared database. In Proceedings of the IFIP Working Conference on Modelling of Data Base Management Systems, pp. 695–723.
  3. Reed, D.P. (1978). Naming and synchronization in a decentralized computer system. Ph.D. dissertation, Massachusetts Institute of Technology.
  4. Kung, H.T. and Robinson, J.T. (1981). On optimistic methods for concurrency control. ACM Transactions on Database Systems 6(2):213–226. doi:10.1145/319566.319567
  5. Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., and O’Neil, P. (1995). A critique of ANSI SQL isolation levels. In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1–10. doi:10.1145/223784.223785
  6. Cahill, M.J., Röhm, U., and Fekete, A.D. (2008). Serializable isolation for snapshot databases. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD ’08), pp. 729–738. doi:10.1145/1376616.1376690
  7. Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J.M., and Stoica, I. (2013). Highly available transactions: Virtues and limitations. Proceedings of the VLDB Endowment 7(3):181–192. doi:10.14778/2732232.2732237

Sources

The following 31 source documents are the consolidated research corpus for this pillar. Inline citations [1]–[31] link to these entries.

  1. Two-Phase Locking — Wikipedia. https://en.wikipedia.org/wiki/Two-phase_locking
  2. Multiversion Concurrency Control — Wikipedia. https://en.wikipedia.org/wiki/Multiversion_concurrency_control
  3. Optimistic Concurrency Control — Wikipedia. https://en.wikipedia.org/wiki/Optimistic_concurrency_control
  4. Serializability — Wikipedia. https://en.wikipedia.org/wiki/Serializability
  5. Vector Clock — Wikipedia. https://en.wikipedia.org/wiki/Vector_clock
  6. Multiple Granularity Locking — Wikipedia. https://en.wikipedia.org/wiki/Multiple_granularity_locking
  7. Gray, J.N., Lorie, R.A., Putzolu, G.R., and Traiger, I.L. (1976). Granularity of Locks and Degrees of Consistency in a Shared Database. Rendered at https://mwhittaker.github.io/papers/html/gray1976granularity.html
  8. Isolation (database systems) — Wikipedia. https://en.wikipedia.org/wiki/Isolation_(database_systems)
  9. Timestamp-Based Concurrency Control — Wikipedia. https://en.wikipedia.org/wiki/Timestamp-based_concurrency_control
  10. Wait-For Graph — Wikipedia. https://en.wikipedia.org/wiki/Wait-for_graph
  11. Two-Phase Locking — Wikipedia (second corpus copy; adds variants table). https://en.wikipedia.org/wiki/Two-phase_locking
  12. Multiversion Concurrency Control — Wikipedia (second corpus copy; adds academic classification, XID wraparound). https://en.wikipedia.org/wiki/Multiversion_concurrency_control
  13. Optimistic Concurrency Control — Wikipedia (second corpus copy; adds implementations list). https://en.wikipedia.org/wiki/Optimistic_concurrency_control
  14. Database Transaction Schedule (Serializability) — Wikipedia (second corpus copy; adds formal notation). https://en.wikipedia.org/wiki/Serializability
  15. Vector Clock — Wikipedia (second corpus copy; adds algorithm details, extensions). https://en.wikipedia.org/wiki/Vector_clock
  16. Multiple Granularity Locking — Wikipedia (second corpus copy; adds NL mode detail). https://en.wikipedia.org/wiki/Multiple_granularity_locking
  17. Isolation (database systems) — Wikipedia (second corpus copy; adds Cursor Stability, SSI mention). https://en.wikipedia.org/wiki/Isolation_(database_systems)
  18. Gray 1976 — Granularity of Locks (second corpus copy; adds detailed compatibility matrix entries). https://mwhittaker.github.io/papers/html/gray1976granularity.html
  19. Readers–Writers Problem — Wikipedia. https://en.wikipedia.org/wiki/Readers–writers_problem
  20. Two-Phase Locking — Wikipedia (third corpus copy; adds 1976 historical context, 2PL vs 2PC). https://en.wikipedia.org/wiki/Two-phase_locking
  21. Multiversion Concurrency Control — Wikipedia (third corpus copy; adds MVCC vs 2PL comparison). https://en.wikipedia.org/wiki/Multiversion_concurrency_control
  22. Optimistic Concurrency Control — Wikipedia / Kung & Robinson 1981 (third corpus copy; adds transaction number timing, starvation detail). https://en.wikipedia.org/wiki/Optimistic_concurrency_control
  23. Serializability — Wikipedia (third corpus copy; adds distributed databases, strict serializability). https://en.wikipedia.org/wiki/Serializability
  24. Multiple Granularity Locking — Wikipedia / Gray 1975 (third corpus copy; adds DAG generalization, correctness proof, lock escalation). https://en.wikipedia.org/wiki/Multiple_granularity_locking
  25. Vector Clock — Wikipedia (third corpus copy; adds CC theory relationship, DynamoDB/Riak/CouchDB applications, HLC). https://en.wikipedia.org/wiki/Vector_clock
  26. Berenson, H., Bernstein, P., Gray, J., Melton, J., O'Neil, E., and O'Neil, P. (1995). A Critique of ANSI SQL Isolation Levels. ACM SIGMOD 1995. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf
  27. File Locking — Wikipedia. https://en.wikipedia.org/wiki/File_locking
  28. Lock Compatibility Matrix — Berkeley CS186 Course Notes. https://cs186berkeley.net/notes/note12/
  29. Deadlock (Computer Science) — Wikipedia. https://en.wikipedia.org/wiki/Deadlock_(computer_science)
  30. Snapshot Isolation / Serializable Snapshot Isolation — Wikipedia (includes Cahill et al. 2008 content). https://en.wikipedia.org/wiki/Snapshot_isolation
  31. Timestamp-Based Concurrency Control — Wikipedia (second corpus copy; adds strict T/O protocol, T/O vs OCC/MVCC comparison). https://en.wikipedia.org/wiki/Timestamp-based_concurrency_control

Sources

  1. Two-Phase Locking - Wikipedia (retrieved 2026-05-03)
  2. Multiversion Concurrency Control - Wikipedia (retrieved 2026-05-03)
  3. Optimistic Concurrency Control - Wikipedia (retrieved 2026-05-03)
  4. Serializability - Wikipedia (Database Transaction Schedule) (retrieved 2026-05-03)
  5. Vector Clocks and Happened-Before — Wikipedia + Synthesis (retrieved 2026-05-03)
  6. Multiple Granularity Locking and Intent Locks - Wikipedia + Synthesis (retrieved 2026-05-03)
  7. Gray (1976) — Granularity of Locks and Degrees of Consistency in a Shared Data Base (retrieved 2026-05-03)
  8. Transaction Isolation Levels — ANSI SQL Standard and Anomalies (retrieved 2026-05-03)
  9. Timestamp-Based Concurrency Control and Thomas Write Rule (retrieved 2026-05-03)
  10. Deadlock Theory — Wait-For Graph, Wound-Wait, Wait-Die (retrieved 2026-05-03)
  11. Two-phase locking - Wikipedia (retrieved 2026-05-03)
  12. Multiversion concurrency control - Wikipedia (retrieved 2026-05-03)
  13. Optimistic concurrency control - Wikipedia (retrieved 2026-05-03)
  14. Database transaction schedule - Serializability - Wikipedia (retrieved 2026-05-03)
  15. Vector clock - Wikipedia (retrieved 2026-05-03)
  16. Multiple granularity locking - Wikipedia (retrieved 2026-05-03)
  17. Isolation (database systems) - Wikipedia (retrieved 2026-05-03)
  18. Granularity of Locks and Degrees of Consistency in a Shared Database (Gray 1976) - Paper Summary (retrieved 2026-05-03)
  19. Readers–writers problem - Wikipedia (retrieved 2026-05-03)
  20. Two-phase locking - Wikipedia (retrieved 2026-05-03)
  21. Multiversion concurrency control - Wikipedia (retrieved 2026-05-03)
  22. Optimistic Concurrency Control (OCC) - Wikipedia / Kung & Robinson 1981 (retrieved 2026-05-03)
  23. Database transaction schedule (Serializability) - Wikipedia (retrieved 2026-05-03)
  24. Multiple granularity locking - Wikipedia / Gray 1975 Foundational Paper (retrieved 2026-05-03)
  25. Vector clock - Wikipedia / Lamport Clocks and Happened-Before (retrieved 2026-05-03)
  26. A Critique of ANSI SQL Isolation Levels (Berenson et al., 1995) (retrieved 2026-05-03)
  27. File locking - Wikipedia / Concurrency Control in File Systems and OS (retrieved 2026-05-03)
  28. Lock Compatibility Matrix - Database Systems Theory (Berkeley CS186) (retrieved 2026-05-03)
  29. Deadlock Prevention - Wait-Die and Wound-Wait Protocols (retrieved 2026-05-03)
  30. Serializable Snapshot Isolation (SSI) - Cahill et al. 2008 (retrieved 2026-05-03)
  31. Timestamp Ordering (TO) Protocol - Concurrency Control Theory (retrieved 2026-05-03)

Home