Pillar: lock-design-granularity | Date: May 2026
Scope: The design space for advisory lock granularity applied to code repositories: file-level vs directory-level vs module-level vs semantic-scope locks, hierarchical lock schemes (intention locks escalating from file to directory), adaptive granularity based on observed contention, advisory vs exclusive vs shared-read lock types, scope declaration protocols (how agents announce intent), and lock coarsening vs splitting tradeoffs.
Sources: 42 gathered, consolidated, synthesized.
Core finding: File-level locking with directory-level intention locks (IS/IX) is the validated default for multi-agent code repositories — conflict detection costs O(depth of tree) rather than O(N descendants) when intention locks are used, directly applying Gray's 1976 Multiple Granularity Locking protocol to the repository → module → file → function hierarchy.[31][21]
The Multiple Granularity Locking (MGL) framework, introduced by Gray, Lorie, Putzolu, and Traiger in 1976, resolves the central tension in lock design: a single global lock yields zero concurrency, while field-level locking generates prohibitive per-item overhead on bulk operations.[2][1] MGL's solution — a 5-mode hierarchy (NL / IS / IX / S / SIX / X) applied to a resource tree — evolved through 3 versions. Version 1 required full descendant traversal to detect conflicts; Version 2 introduced IS/IX intention modes reducing detection to O(depth); Version 3 added the SIX (Shared + Intention Exclusive) mode enabling mixed scan-whole/write-specific workloads without taking a full exclusive lock on the parent.[31] The database hierarchy (database → area → file → page → record) maps directly to code: repository → module directory → source file → class → function/line-range. Two agents holding IX on the same directory do not conflict at that level — their actual conflict, if any, surfaces only when each acquires X on a specific file below, which is precisely what enables parallel agents to work within the same module simultaneously.[31]
Empirical granularity benchmarks reveal that the optimal choice is heavily platform-dependent. In a controlled experiment, coarse-grained locking outperformed fine-grained by a 3:1 ratio, with fine-grained failing to benefit reliably from additional threads.[9] Fine-grained lock overhead reached 23% on ARM but only low single-digit percent on x86 — the same code at different hardware granularities produces radically different performance curves. Java Striped Lock (individual hash-bucket locks) yielded roughly 10% extra throughput over a basic single-lock approach.[34] JVM lock coarsening — merging 4 adjacent acquire/release pairs into 1 via loop unrolling — produced a ~4x throughput improvement in benchmarks, demonstrating that eliminating repeated lock round-trips often outweighs any concurrency gain from fine granularity.[13] The implication is direct: measurement on the target platform is mandatory before committing to a granularity level. Analytical prediction is unreliable across architectures.
Lock escalation and de-escalation are the runtime mechanisms for correcting a wrong granularity choice without a full redesign. SQL Server's production-validated thresholds provide the most concrete reference points: escalation triggers at 5,000 row/key locks on a single table reference, or when locks consume 24% of the buffer pool; the system checks thresholds every ~1,250 lock acquisitions rather than on every acquisition to avoid overhead.[42] Escalation converts an IX/IS intent lock on the parent directly to a full X/S table lock, skipping page-level intermediates. The inverse — de-escalation from a coarse bucket lock to per-member locks on conflict (US Patent 6144983) — applies when a single conflicting request arrives, splitting the coarse lock to restore parallelism only where needed.[20] For repository lock managers, the SQL Server pattern translates to: track file lock counts per directory; escalate to a directory lock when an agent holds more than N file locks within that directory, using both count and memory pressure as dual escalation signals.
Adaptive granularity strategies span from simple threshold rules to ML-driven continuous optimization. An IEEE-published algorithm for collaborative authoring systems — structurally identical to multi-agent code editing — dynamically transitions between granularity levels based on real-time contention monitoring, without downtime.[5] A Random Forest-based system documented in TD Commons monitors average lock wait time, transaction abort rates, and per-granularity contention rates, recalibrating thresholds continuously and operating on-the-fly without system restart.[32] Both threshold-based and ML-driven approaches have been independently validated: the consensus finding is that fixed granularity is rarely optimal across mixed workloads, and runtime contention monitoring consistently outperforms static analysis for granularity decisions at scale.
OS-level file locks introduce critical failure modes that invalidate them for multi-agent systems. The most dangerous: closing any file descriptor to a POSIX-locked file (via fcntl) releases all locks held by that process on that file — including locks acquired on different fds. Standard library functions that internally open and close files can silently destroy in-flight agent locks.[24][25] NFS compounds this: many NFS implementations improperly implement fcntl lock semantics, and there is no reliable detection method for whether locking works on a specific NFS mount. Application-level advisory locks — PostgreSQL's pg_advisory_lock(hash_of_filepath), Redis SET NX EX, or etcd — avoid all OS lock API pitfalls, work across network boundaries, and provide connection-scoped or TTL-based crash safety without external cleanup infrastructure.[23] PostgreSQL advisory locks are particularly well-suited to repository lock managers: the file path hash serves as the lock key, enabling per-file granularity without a separate lock record table, and connection-scoped auto-release handles agent crashes without any TTL management overhead.
Readers-writer lock priority policy has a decisive impact on agent throughput for code editing workloads. Read-preferring policies allow new readers to enter while writers queue, risking indefinite writer starvation under continuous analysis-agent load.[11] Write-preferring policies block new readers when a writer is queued, ensuring modifications complete promptly. For repositories where analysis agents (read) and editing agents (write) coexist, write-preferring is the correct default — editing agents cannot be blocked indefinitely by a stream of read-only agents scanning the codebase.[26] A related hazard: read-to-write lock upgrade is a deadlock trap. Two agents both holding read locks, both attempting upgrade, both waiting for the other to release — neither can proceed. The fix is to declare write intent upfront via SIX mode, structurally preventing the upgrade race.[11]
Git worktrees provide branch-level isolation but introduce a repository-wide creation bottleneck. When 3 or more agents are launched in parallel with worktree isolation, concurrent git worktree add commands race for .git/config.lock, causing agent failures with orphaned branches (documented in GitHub Issue #34645).[27][41] The granularity mismatch is structural: the creation-time lock is repository-wide, while the execution-time isolation is fully per-worktree. The correct fix is serializing git worktree add calls with a mutex or sequential queue — execution remains fully parallel once each worktree is created. Beyond the creation race, 6 distinct conflict types require different granularity responses: git index lock errors (per-worktree cleanup), branch checkout conflicts (enforced by Git automatically), package manager lock file conflicts (serialize per worktree), merge conflicts at integration (require semantic scope declaration, not just file locks), stale worktree references (periodic git worktree prune), and build artifact conflicts (isolated node_modules/ and .venv/ per worktree).[28]
File-level locks solve physical correctness but cannot enforce semantic correctness. Two agents holding different files can produce incompatible interfaces — a conflict that only manifests after both complete, at integration time. The FLP Impossibility Theorem applies: no deterministic protocol can ensure all non-faulty agents reach semantic consensus in bounded time in a distributed system with arbitrary delays.[6] With more than (n-1)/3 agents misinterpreting requirements, consensus becomes impossible regardless of model capability.[22] Scope declaration protocols (ACP Agent Cards at /.well-known/agent.json, which validate identity, capability scope, delegation chain, and policy compliance simultaneously before execution) address the communication layer that file locks miss.[37] External validation — automated tests and static analysis as quality gates — converts semantic misinterpretations into detectable failures rather than silent divergence, serving as the practical substitute for semantic lock enforcement.
For practitioners building multi-agent repository systems today: Start with file-level application locks (PostgreSQL advisory or Redis NX) plus directory-level IS/IX intention locks — this delivers O(depth) conflict detection without OS lock API pitfalls. Use write-preferring RW lock policy and declare SIX intent upfront rather than upgrading from read. Serialize git worktree add calls with a simple mutex; execution is already fully parallel. Apply threshold-based escalation (escalate to directory lock when agent holds more than a configured N file locks in the same directory, analogous to SQL Server's 5,000-lock threshold) to cap lock table overhead at scale. Separate topology locks (branch creation, module moves) from content locks (file edits) following the Linux kernel two-tier pattern — structural changes must not block unrelated file modifications. Add non-overlapping task assignment at planning time as a first line of defense: preventing conflicts architecturally is more reliable than detecting them at runtime.
The canonical framework for hierarchical lock granularity originates in a 1976 paper by Gray, Lorie, Putzolu, and Traiger: "Granularity of Locks and Degrees of Consistency in a Shared Data Base."[2] Multiple Granularity Locking (MGL) resolves the fundamental tension between two extremes: a single global lock permits only one transaction at a time regardless of which data it accesses, while field-level locking requires prohibitive per-field lock acquisition overhead on bulk operations.[1][16]
"Granularity refers to the size of the data item on which a lock is applied."[16] The two axes of the tradeoff are directly opposed: small granularity (field-level) yields high concurrency but high overhead; large granularity (file-level or table-level) yields low overhead but low concurrency.[1][2] No fixed granularity optimizes both simultaneously for mixed workloads.
Key finding: MGL resolves the granularity dilemma by allowing transactions to select the granularity level that matches their access pattern — bulk operations acquire coarse locks; fine-grained operations acquire targeted record locks — within a single shared tree hierarchy.[1][31]
Resources organize into a tree structure: database → area/table → file/page → record/row. A lock on any node implicitly locks all descendants in that subtree.[1][3][31] Conflict detection across granularity levels — detecting that a directory-level lock conflicts with a pending file-level lock — costs O(depth of tree) rather than O(N descendants) when intention locks are used.[31][19]
| Version | Innovation | Limitation Addressed |
|---|---|---|
| Version 1[31] | Implicit locking (locking a node locks descendants) | Required full descendant traversal to detect conflicts — O(N descendants) |
| Version 2[31] | Introduced IS/IX intention modes at parent nodes | Reduced conflict detection to O(depth) without traversing descendants |
| Version 3[31] | Added SIX (Shared + Intention Exclusive) mode | Enabled mixed read-whole/write-specific workloads without full X lock |
The database hierarchy maps directly to code repositories: repository → directory/module → file → function/line-range.[18][21] Agents with large-scope work (refactoring an entire module) acquire coarse directory locks; fine-grained agents (patching a single function) lock individual files. The same adaptive logic applies in both domains.
See also: Concurrency Control Theory (theoretical foundations of 2PL and transaction models)The Gray 1976 framework extends standard Shared (S) and Exclusive (X) locks with three intention modes that signal locking intent at parent hierarchy levels, enabling efficient cross-granularity conflict detection without descendant traversal.[1][2][16]
| Mode | Meaning | Conflicts With |
|---|---|---|
| NL (Null Lock)[1] | Placeholder; no lock held | Nothing |
| IS (Intention Shared)[1][31] | Plans to acquire S lock on one or more descendants | X only |
| IX (Intention Exclusive)[1][31] | Plans to acquire X lock on one or more descendants | S, SIX, X |
| S (Shared)[2][3] | Read access to entire subtree; implicitly IS on all descendants | IX, SIX, X |
| SIX (Shared + Intention Exclusive)[31] | Subtree is locked S; X locks may be acquired at lower levels | IX, S, SIX, X |
| X (Exclusive)[2][3] | Exclusive read-write access to entire subtree | IS, IX, S, SIX, X |
| Requested ↓ \ Held → | NL | IS | IX | S | SIX | X |
|---|---|---|---|---|---|---|
| NL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| IS | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| IX | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| S | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| SIX | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| X | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
(Source: Gray 1976 as reproduced via Wikipedia MGL article)[16][29]
All sources converge on the same seven mandatory rules for correct MGL operation:[3][17][31]
Two agents both holding IX on the same directory do not conflict at that level — their actual conflict (if any) resolves when they each acquire X locks on specific files below. The directory IX only signals "something exclusive is happening below," not which item.[31] This property is essential for parallel agent work within the same module.
A transaction scanning an entire module (needs S) while modifying specific files (needs X) would otherwise require either S on every file (high overhead) or X on the whole directory (blocking all readers). SIX on the directory resolves both needs: scan concurrent with other readers, writes exclusive at file level.[31]
Key finding: Despite hierarchical lock ordering, deadlock remains possible with MGL — it does not eliminate circular wait, only conflicts across granularity levels. Standard 2PL deadlock detection and prevention still apply.[1][3]See also: Deadlock, Starvation & Fairness (deadlock detection and prevention in hierarchical lock systems)
The granularity selection question is the central empirical and theoretical problem in lock design. Multiple independent sources converge on consistent findings about the performance envelope of each extreme.[9][21][34]
| Property | Coarse-Grained | Fine-Grained |
|---|---|---|
| Concurrency | Low | High |
| Lock overhead | Low | High |
| Implementation complexity | Simple | Complex |
| Deadlock risk | Low | Higher |
| Scalability ceiling | Limited by lock contention | Better with proper tuning |
| Best for | Low-contention, simple systems | High-contention, many concurrent agents |
(Source: AlgoMaster.io Concurrency Interview series)[9][21][34]
| Experiment | Result | Source |
|---|---|---|
| Coarse vs. fine-grained controlled experiment | Coarse-grained outperformed fine-grained by 3:1 ratio; fine-grained did not reliably benefit from additional threads | [9] |
| ARM locking overhead measurement | Fine-grained performed at only ~75% of big-kernel-lock throughput until BKL reached performance knee; overhead reached 23% on ARM | [9] |
| x86 locking overhead measurement | Low single-digit percent overhead on x86 — significantly more favorable than ARM | [9] |
| Java Striped Lock benchmark | ~10% extra throughput over basic approaches; locks individual hash-bucket stripes rather than entire structure | [34] |
| JVM lock coarsening (4x unroll) | ~4x throughput improvement from merging 4 adjacent lock/unlock pairs into one | [13] |
Key finding: "Theory often loses to practice" — hardware architecture (especially ARM vs. x86), JVM behavior, and OS scheduling make granularity performance difficult to predict analytically. Measurement on the target platform is mandatory.[9]
Throughput impact of coarse granularity: With file-granularity locking, an agent accessing only a few functions still locks the whole file, blocking other agents from accessing unrelated functions in that file. At directory granularity, one agent working on a single file blocks all agents from the entire module.[9][21]
Overhead impact of fine granularity: Lock acquisition and release overhead can dominate at fine granularity. For large read operations, fine-grained locking generates thousands of lock operations — one per file accessed — versus a single coarse lock.[9]
| Granularity Level | Scope | Overhead | Concurrency | Appropriate When |
|---|---|---|---|---|
| Repository-level | Entire codebase | Minimal (1 lock) | None (serial) | Rarely — only for global restructuring |
| Directory-level | Module/package | Low | Between modules | Agents working in distinct modules |
| File-level | Single file | Medium | Between files | Standard granularity — most common in practice[21] |
| Line-range/function-level | Code region | High | Within files | Maximum parallelism; high complexity |
"The reason so many different synchronization objects have been invented is to try to find a sweet spot where you have fine-grained locking so that performance is high, but it is easy enough to program so mistakes can be avoided."[9][34] The optimal strategy for code repositories typically uses file-level locking as the default with directory-level intention locks to prevent structural conflicts.[21]
Advisory locking requires all participating processes to cooperate — every agent must check and respect locks before accessing shared resources. The operating system does not prevent unauthorized access; it relies entirely on application-level cooperation. A single non-cooperative agent can ignore advisory locks and access files freely.[10][24][25]
Mandatory locking enforces access control at the kernel level but is less common in Unix-like systems and has known reliability issues on some platforms.[36] For multi-agent coding systems where all agents are under controlled deployment, advisory locks are appropriate — but any agent that bypasses the lock protocol breaks the entire system.[10]
The readers-writer lock (RW lock, MRSW lock) is the foundational pattern for differentiating read concurrency from write exclusivity.[11][26][38][39]
| Access Mode Pair | Compatible? | Rationale |
|---|---|---|
| Read + Read | ✓ Yes | Multiple readers do not interfere |
| Read + Write | ✗ No | Writer could invalidate in-progress reads |
| Write + Write | ✗ No | Concurrent writes produce undefined state |
| Policy | Behavior | Risk | Best For |
|---|---|---|---|
| Read-preferring[11] | New readers can always enter while writers queue | Writer starvation under continuous reader load | Read-heavy, latency-tolerant writes |
| Write-preferring[11][26] | New readers blocked when a writer is queued | Reader delay, but no starvation | Code editing workloads — writes must complete promptly |
| Fair/In-order[11] | FIFO ordering, no starvation for either | More complex to implement | Balanced mixed workloads |
Write-preferring is generally recommended for code editing workloads — agent modifications must not be indefinitely blocked by concurrent readers (e.g., analysis agents).[11][26]
Converting a read lock to a write lock is a deadlock trap: two agents both holding read locks, both attempting upgrade, both waiting for the other to release — neither can proceed. The solution is to allow only one agent to hold "read with intent to upgrade to write" — structurally identical to the SIX lock mode in MGL.[11]
Windows implements oplocks — an optimistic advisory locking pattern for networked file servers — operating at file level (not byte-range).[15] The protocol:
The oplock break mechanism maps directly to lock preemption in multi-agent systems: a higher-priority agent can preempt a lower-priority agent's file lock with appropriate notification and flush protocol.[15]
| Oplock Type | Capabilities |
|---|---|
| Read-Write-Handle[15] | Full local caching + handle tracking across open/close |
| Read-Write[15] | Full local caching without handle tracking |
| Read-Handle[15] | Read caching + handle tracking |
| Read[15] | Read-only caching |
Two distinct advisory lock APIs exist on Linux with critically different semantics relevant to multi-threaded agent processes:[24][25][36]
| Feature | POSIX Locks (fcntl) | BSD Locks (flock) | OFD Locks (Linux 3.15+) |
|---|---|---|---|
| Byte-range locking | ✅ Yes | ❌ No (whole-file only) | ✅ Yes |
| NFS support | ✅ Yes | ⚠️ Emulated via fcntl | Linux-local only |
| Deadlock detection | ✅ Yes (EDEADLK) | ❌ No | ✅ Yes |
| Bound to | Process (PID) | File descriptor | Open file description (object) |
| Thread-safe | ❌ No | ✅ Yes | ✅ Yes |
| Atomic mode switch | ✅ Yes | ❌ No (race window) | ✅ Yes |
| POSIX-standardized | ✅ Yes | ✅ Yes | ❌ Linux-specific |
Critical POSIX lock flaw: Closing any file descriptor to a locked file releases all locks held by the process on that file — even locks acquired on different fds. Library functions that internally open and close files silently destroy in-flight agent locks. This makes POSIX fcntl locks dangerous for multi-threaded agent processes that use standard library calls.[24][25]
NFS reliability warning: Many NFS implementations improperly implement fcntl lock semantics. There is no reliable way to detect whether file locking works on a specific NFS mount. Advisory locks on NFS-mounted repositories are unreliable. NFSv4 introduces stateful locking with seqid to prevent retry inconsistencies but is not universally deployed.[10]
Key finding: Application-level advisory locks (PostgreSQL advisory locks, Redis, etcd) avoid all OS lock API pitfalls and work across network boundaries. For most multi-agent repository systems, OS-level file locks are an inferior choice.[24][25]
Three runtime mechanisms exist for dynamically adjusting lock granularity within a running system. Each addresses a different performance pathology.
Lock coarsening merges adjacent synchronized blocks that have no observable operations between them into a single larger critical section, eliminating repeated acquire/release overhead.[13][40]
Repository application: If an agent repeatedly locks/unlocks the same directory for multiple sequential file operations, coarsen to a single directory lock for the entire operation batch.[13]
Lock splitting breaks one coarse lock into multiple finer-grained locks to increase parallelism when the coarse lock becomes a bottleneck.[13]
Java ConcurrentHashMap (pre-JDK 1.8): Array of Segment objects, each with its own ReentrantLock. A put operation locks only the relevant Segment, leaving all others available concurrently.[13]
| Contention Level | Expected Gain from Splitting |
|---|---|
| Low | Small — limited parallelism to unlock |
| Medium | Best gains — splitting resolves bottleneck without overhead dominating |
| Heavy | Variable — lock overhead itself may become the bottleneck |
(Source: Aleksey Shipilev, JVM Anatomy Quarks)[13]
Repository application: If one broad directory lock is always contended, split by subdirectory or file-type grouping (e.g., separate lock ranges for src/, tests/, config/).
JIT compilers remove locks entirely via escape analysis — if the lock object provably cannot escape to other threads, the lock is a no-op and is eliminated. Common with JDK APIs: StringBuffer, Vector, Hashtable.[13][40] Repository application: If an agent has an assigned worktree and provably has exclusive access to a set of files, skip lock acquisition entirely — isolation already guarantees exclusivity.
Lock escalation converts many fine-grained locks into fewer coarse-grained locks when threshold conditions are met, reducing lock table memory consumption and management overhead.[42]
| Threshold Type | Value | Notes |
|---|---|---|
| Per-statement lock count[42] | 5,000 locks on a single table reference | Triggers escalation attempt |
| Memory threshold[42] | 24% of buffer pool consumed by locks | Triggers system-wide escalation |
| Check frequency[42] | Every ~1,250 lock acquisitions | Avoids checking every acquisition |
| Escalation path[42] | Row/key → table lock directly | Skips page-level intermediate |
Escalation process: Attempts to convert the Intent lock (IX/IS) on the parent to a full lock (X/S). If blocked by another transaction, escalation is deferred — it does not wait.[42] SQL Server also supports HoBT (heap/B-tree partition) escalation for partitioned tables as an intermediate between row and table level.[42]
US Patent 6144983 describes the reverse process — de-escalating a coarse-grained bucket lock into fine-grained member locks when a conflicting request arrives for a specific member.[20] Rather than a static tree hierarchy, locks are assigned to hash buckets based on empirical access patterns. The system learns which files are frequently co-modified and groups them into buckets for efficient coarse locking. When a conflict is detected on a single bucket member, the bucket lock is split into per-member locks to restore parallelism.[20]
Key finding: SQL Server's escalation design insight for repository lock managers: track file lock count per directory. If an agent holds >N file locks within a directory, automatically escalate to a single directory lock to reduce lock table overhead. Use both lock count and memory pressure as dual escalation signals.[42][20]See also: Concurrency Control Theory (2PL theory underlying escalation compatibility)
Fixed granularity is rarely optimal across all workloads. Four distinct approaches to runtime granularity adaptation have been documented in the literature.
An IEEE-published adaptive-granularity algorithm designed for collaborative authoring systems is structurally analogous to multi-agent code editing.[5] The algorithm:
The document hierarchy used mirrors code hierarchy exactly: Document → Chapter/Section → Paragraph → Sentence maps to Repository → Directory → File → Function.[5]
Two-phase spin-then-block strategy where spin duration is determined by recent success/failure rates and the current lock owner's CPU state.[40] "Adaptive" here means the granularity of wait behavior adjusts based on observed lock hold durations — avoids expensive OS context switches for short critical sections while falling back to blocking for long ones.
A Random Forest-based adaptive system documented in TD Commons continuously monitors workload patterns and selects granularity levels to minimize total cost (wait time + overhead):[32]
Hash-based adaptive escalation (US Patent 6144983), where locks are de-escalated from coarse bucket locks into per-member locks on conflict, is covered in Section 5 (Lock De-escalation).[20]
| Principle | Source |
|---|---|
| Fixed granularity is rarely optimal across all workloads | [5][32] |
| Runtime contention monitoring enables better lock granularity decisions than static analysis | [5][32] |
| Monitoring overhead is offset by reduced lock contention at scale | [32] |
| Both threshold-based (lock count, memory pressure) and ML-based adaptation have been validated independently | [42][32] |
| Access pattern learning (hash buckets) outperforms static tree hierarchies for non-uniform co-access patterns | [20] |
Key finding: The ML-driven adaptive approach (Random Forest on contention metrics) is the most general solution — it does not require pre-defining a hierarchy or static thresholds. However, threshold-based escalation (lock count + memory pressure) is simpler to implement correctly and well-validated in production database systems.[32][42]
The Linux kernel implements a two-tier locking scheme for directory operations separating content locks from topology locks:[14]
->i_rwsem): Protects individual directory inodes->s_vfs_rename_mutex): Protects tree topology changes (renames, directory moves)| Operation | Locks Required | Mode |
|---|---|---|
| Read access[14] | Target directory lock | Shared |
| Object creation[14] | Parent directory lock | Exclusive |
| Object removal[14] | Parent directory + victim | Both Exclusive (sequential) |
| Link creation[14] | Parent directory + source | Both Exclusive |
| Same-directory rename[14] | Parent directory lock | Exclusive, conditional on source/target |
| Cross-directory rename[14] | Filesystem lock + ancestry verification + ancestor-first parent locks + subdirectory locks | All Exclusive |
The Linux kernel prevents lock-order deadlocks through strict hierarchical acquisition:[14]
For code repositories, use file path hash or alphabetical ordering as the canonical lock acquisition order when acquiring multiple same-level file locks simultaneously.[14]
Cross-directory rename exception: After taking the filesystem lock, verify the common ancestor relationship — this relationship cannot change until the filesystem lock drops, eliminating the specific deadlock scenario for cross-directory moves.[14]
Two-tier principle for code repositories: Separate topology locks (branch/repo structure modifications, directory creation, module moves) from content locks (file modifications within stable directories). These can be held independently, increasing parallel agent throughput.[14]
Git uses file-based locks to protect repository integrity. Three lock files are relevant to parallel agent execution:[41][27]
| Lock File | Scope | Granularity |
|---|---|---|
.git/index.lock[41] |
Per-working-directory index | Medium — blocks all git operations on that working tree |
.git/config.lock[27] |
Repository-wide configuration | Coarse — blocks all worktree creation/deletion |
.git/worktrees/<n>/index.lock[41] |
Per-worktree index | Medium — isolated per worktree after creation |
GitHub Issue #34645 documents a real-world granularity failure in Claude Code: when 3+ agents are launched in parallel with isolation: "worktree", concurrent git worktree add commands race for .git/config.lock, causing agent failures with orphaned branches.[27][41]
Root cause: The config lock is too coarse — it serializes all worktree creation even for agents that will work on entirely independent files. The granularity mismatch is between the creation-time lock (repository-wide) and the execution-time lock (per-worktree).
| Option | Approach | Trade-off |
|---|---|---|
| A (Recommended)[27] | Serialize git worktree add with internal mutex/queue |
Correct; slight delay in agent startup |
| B[27] | Retry with exponential backoff | Simple; non-deterministic delay; may still fail under heavy load |
| C[27] | Skip worktree isolation for read-only agents | Eliminates contention for read agents; requires write intent declaration |
Key finding: Git worktree provides coarse branch-level isolation (one worktree per branch, enforced by Git) but its creation mechanism introduces a repository-wide config lock bottleneck for parallel agent launch. Worktree creation must be serialized; worktree execution is fully parallel.[41][27]
When a coding agent is about to modify a shared component, it posts a lock or intent declaration in a shared coordination context: "Coder1: editing AuthModule." Other agents see this and either wait or negotiate.[12]
Contention handling: Exponential backoff — agent waits and doubles the delay on each retry attempt. Scheduling protocols: FIFO, priority queues by task importance, synchronization ordering (Agent B cannot start until Agent A completes).[12]
The Agent Control Protocol (ACP), published as arxiv preprint (2603.18829v5), defines a formal scope declaration standard via Agent Cards at /.well-known/agent.json — public declarations of agent capabilities, supported modes, authentication requirements, and capability scope.[37]
ACP validates all of the following simultaneously (not sequentially) before execution:[37]
ACP messages are tagged using Speech Act Theory: REQUEST, INFORM, PROPOSE, or REFUSE. Agents send intent, not just data. This is structurally analogous to IS/IX intention locks: agents declare intent before acting, and admission control validates scope before execution is permitted.[37]
Non-transitivity of trust: A trusts B, B trusts C does not imply A trusts C. This is relevant for multi-agent file access chains where intermediate agents act as coordinators.[37]
GitHub Copilot's Squad system uses a drop-box pattern for coordination: every architectural choice is appended as a structured block to a versioned decisions.md file. This is advisory locking at the semantic/decision level rather than the file level.[7]
Squad's bet: asynchronous knowledge sharing inside the repository scales better than real-time synchronization.[7] The tradeoff is lower coordination overhead at the cost of potential inconsistency windows during concurrent writes to the decisions file.
Reviewer exclusivity protocol: The reviewer protocol prevents the original author from revising their own rejected work — a different agent must step in. This implements mutual exclusion at "authorship scope" — the original author cannot hold the "revise" lock on their own rejected output.[7]
Task assignment serves as scope declaration — agents implicitly claim files when assigned a task. The shared task list coordinates work through dependency tracking with statuses: pending, in_progress, completed, blocked.[8][33]
| Pattern | Description | Lock Type Equivalent |
|---|---|---|
| "One file, one owner"[8] | Never let two agents edit the same file simultaneously | File-level advisory exclusive lock via assignment |
| Scoped visibility[8] | Each agent only sees files it owns — limits accidental modification | Implicit lock via information isolation rather than explicit enforcement |
| WIP limits[8] | Limits tasks any agent has in-progress simultaneously | Lock coarsening analog — limits concurrency to reduce coordination overhead |
| Spec-scoped tasks[33] | Work boundaries defined at specification time, not expanded at runtime | Scope declaration with hard boundary — IX locks only on declared subtree |
| Protocol | Purpose | Lock System Analog |
|---|---|---|
| MCP (Model Context Protocol)[12] | Standardizes agent access to tools and contextual data | Lock registry lookup interface |
| A2A (Agent-to-Agent) protocol[12] | Governs peer coordination, negotiation, delegation | Lock contention negotiation between agents |
| ACP (Agent Control Protocol)[37] | Admission control via capability scope validation | Scope declaration + conflict detection before execution |
| A2A task state machine[37] | submitted → working → input-required → completed/failed | Lock lifecycle management (acquire → hold → release) |
Byzantine Fault Tolerance constraint: Multi-agent lock protocols tolerate up to (n-1)/3 faulty agents. When agents crash holding locks, dead-agent lock release protocols (TTL, ephemeral node deletion) are essential — not an edge case.[12]
Key finding: "In production systems, coordination failures almost always originate in communication, not reasoning quality. Agents may be individually capable, yet collectively incoherent, because they do not share a common interaction protocol."[22] Scope declaration protocols are the communication layer that prevents this.
For multi-agent systems operating across processes or machines, OS-level file locks are insufficient. Three distributed locking infrastructure options have distinct tradeoff profiles.[23][35]
All distributed lock implementations share the same logical flow:[23]
| Mechanism | Acquisition Method | Crash Safety | Consistency | Best For |
|---|---|---|---|---|
| Redis (TTL-based)[23] | SET lockKey agentID NX EX 30 — NX ensures atomic if-absent creation, EX sets 30s auto-expiry |
TTL auto-expiry | Eventual (single-node) | Single datacenter; existing deployments; lightweight |
| ZooKeeper/etcd[23] | Replicated lock records across cluster; session-ephemeral nodes | Session expiry auto-deletes ephemeral node, notifies waiters immediately | Strong (Raft/ZAB consensus) | Multi-region; leader election; high-stakes coordination |
| PostgreSQL advisory locks[23] | pg_advisory_lock(hash_of_filepath) — operates on arbitrary integer IDs |
Connection-scoped auto-release on disconnect | Strong (ACID within session) | Monolithic architectures; minimal infrastructure; atomic multi-resource locks |
| DB row-level locks[35] | SELECT ... FOR UPDATE — exclusive until commit/rollback |
Transaction rollback on disconnect | Strong | Transactional lock acquisition patterns |
| Kubernetes single-instance[23] | Deployment constraint | N/A (operational elimination) | N/A | Eliminating concurrency entirely as an architectural choice |
PostgreSQL advisory lock advantage for repositories: File path hash as lock key enables per-file advisory locks without creating actual lock records in a file table. Connection-scoped auto-release provides agent crash safety without TTL management.[23]
When multiple agents compete for a single lock covering a large resource, the system becomes effectively single-threaded through that lock.[35] Solution: partition locks so each partition is independently lockable — analogous to lock splitting at the infrastructure level rather than the application level.
| Failure Mode | Consequence | Mitigation |
|---|---|---|
| Agent crashes holding lock[23] | Permanent zombie lock blocking all other agents | TTL expiry; ephemeral ZooKeeper nodes; connection-scoped DB locks |
| Clock skew across agents[23] | TTL-based locks may expire prematurely or persist too long | Fencing tokens; logical clocks; use consensus-based systems instead of TTL |
| Network partition[23] | CAP theorem: cannot guarantee both safety and availability during partition | Accept as design constraint; design for graceful degradation, not elimination |
| Circular wait across agents[23] | Deadlock | Consistent global lock ordering: all agents acquire Lock1 before Lock2 |
Key finding: "Try to avoid distributed locking if you can" — validate the necessity of distributed coordination before implementing it. Partitioning, idempotency, and event sourcing can eliminate the need for distributed locks in many scenarios, removing an entire class of failure modes.[35]
File-level locking addresses physical conflicts (two agents writing the same bytes) but not semantic conflicts (two agents holding different files but producing incompatible interfaces).[22][6]
"Semantic locks (locking by API surface, module interface, or dependency graph edges) would address design-level conflicts that file locks miss. Scope declaration protocols must be expressive enough to capture design intent, not just file access."[22]
FLP Impossibility applied to multi-agent coding: The FLP Theorem states that in any distributed system with arbitrary delays and potential crashes, no deterministic protocol can ensure all non-faulty nodes reach consensus in bounded time. File-level locks help with physical correctness but do not solve the fundamental consensus problem for semantic correctness.[6][22]
Byzantine Generals bound: With more than (n-1)/3 agents misinterpreting requirements, consensus becomes impossible regardless of model capability. External validation (tests, static analysis) converts misinterpretations into detectable failures rather than silent semantic divergence.[6]
| Conflict Type | Granularity Level | Lock Approach |
|---|---|---|
| Git index lock errors[28] | Per-worktree (medium) | Already isolated — ensure proper cleanup of stale lock files |
| Branch checkout conflicts[28] | Branch level (coarse) | One-worktree-per-branch enforced by Git automatically |
| Lock file conflicts (package managers)[28] | Per-package-manager (medium) | Serialize or isolate package manager calls per worktree |
| Merge conflicts at integration[28] | Hotspot file (fine) | Semantic scope declaration required; worktree isolation insufficient |
| Stale worktree references[28] | Infrastructure (coarse) | Periodic cleanup: git worktree list + git worktree prune |
| Build artifact conflicts[28] | Build cache (medium) | Isolate build directories per worktree; separate node_modules/, .venv/ |
Hotspot files (routes, type registries, configuration files) are the primary source of merge conflicts — multiple features inevitably touch them. Prevention: designate single owners, merge early and often, use additive-only changes (add to new files rather than modifying shared ones).[28][41]
| Granularity | Mechanism | Scope | Trade-off |
|---|---|---|---|
| Branch-level[41] | Git worktree | Entire branch content | Coarse; prevents all file conflicts within branch; no semantic protection |
| Directory-level[41] | Application lock (IX/X) | Module/package directory | Medium; allows parallel work in different modules |
| File-level[41][8] | Application lock file or advisory lock | Single file | Fine; standard — allows parallel work on different files in same module |
| Task-level[8] | JSON status file | Task ownership scope | Application-defined; implicit rather than enforced |
| Semantic-level[22] | Intent declaration (ACP/Agent Cards) | API surface, module interface, dependency graph edges | Most expressive; complex to implement; no tooling standard yet |
| Git operation[27] | .git/config.lock |
Repository metadata | Automatic, not configurable; bottleneck during parallel agent launch |
From real-world multi-agent deployment experience:[28][41][8]
git worktree list and git worktree prune to prevent stale reference accumulation that blocks new worktree creationKey finding: "A multi-agent coding workspace functions only when coordination is treated as infrastructure: explicit task boundaries, isolated execution, and evidence-based merges. Lock scope design is therefore not merely a performance concern — it is a foundational correctness requirement for any serious multi-agent code repository system."[33]
The MGL database hierarchy translates directly to code repository structures, enabling the full MGL protocol — with all five lock modes and the compatibility matrix — to apply to agent coordination:[18][21][31]
| MGL Level | Database Analog | Code Repository Analog | Typical Lock Mode |
|---|---|---|---|
| Root[1] | Database | Repository root | IS or IX (rarely S or X) |
| Area[1] | Schema/database area | Top-level module / package directory | IS, IX, or SIX |
| File[1] | Table / relation | Source file | S (read) or X (write) |
| Page[1] | Page / block | Class / module section within file | S or X (fine-grained) |
| Record[1] | Row / record | Function / method / line range | X (write target) |
Drawing from the Linux kernel's two-tier design:[14] repository lock systems should separate:
This two-tier design ensures that an agent restructuring a module tree does not block other agents modifying files within stable directories, and vice versa.
Not all agents require global context, and broad context can degrade performance:[33]
| Decision | Recommendation | Rationale | Source |
|---|---|---|---|
| Default lock granularity | File-level with directory intention locks (IS/IX) | Standard granularity balancing overhead and concurrency; intention locks enable O(depth) conflict detection | [21][31] |
| Lock type for writes | Write-preferring RW locks | Agent modifications must not be indefinitely blocked by concurrent analysis agents | [11][26] |
| Lock upgrade | Declare write intent upfront (SIX); never upgrade from read to write | Read-to-write upgrade deadlocks two agents both attempting upgrade | [11] |
| OS vs. application locks | Application-level locks (PostgreSQL advisory, Redis, etcd) preferred over OS file locks | POSIX fcntl has silent lock release on any fd close; flock has no byte-range; both fail on NFS | [24][25] |
| Parallel agent launch | Serialize git worktree add calls; parallelize execution |
.git/config.lock is a repo-wide bottleneck during creation; per-worktree execution is fully isolated | [27][41] |
| Coarsening threshold | Escalate to directory lock when agent holds >N file locks in same directory | SQL Server validates: 5,000 row locks triggers table escalation; adapt threshold to repo size | [42] |
| Crash safety | TTL expiry or connection-scoped locks on all lock records | Crashed agents leave permanent zombie locks without automatic cleanup | [23] |
| Semantic conflicts | File-level locks + non-overlapping task assignment + API surface declarations | File locks do not prevent interface incompatibility; semantic locks require explicit scope declaration | [22][8] |
| Topology vs. content | Separate topology locks (module moves) from content locks (file edits) | Linux kernel two-tier design proven in production: prevents structural changes from blocking content work | [14] |
| Adaptive granularity | Start with threshold-based escalation (lock count + memory); add ML-driven adaptation at scale | Threshold-based proven in SQL Server production; ML-driven validated for continuous optimization | [42][32] |
Key finding: The five-mode MGL protocol (NL/IS/IX/S/SIX/X) with the compatibility matrix and seven protocol rules is directly applicable to multi-agent code repositories, mapping database hierarchy levels to repository → module → file → function. The primary gap between database MGL and repository MGL is the semantic layer: file locks address physical correctness but cannot enforce API surface compatibility across agent boundaries.[1][22][31]See also: Scope Overlap Detection (detecting when two lock scopes intersect before acquisition); Deadlock, Starvation & Fairness (handling circular wait and writer starvation in lock hierarchies)