Pillar: monorepo-tooling | Date: May 2026
Scope: How large organizations coordinate thousands of engineers in single repos: Google's Piper/CitC virtual filesystem, Meta's Sapling/Eden, trunk-based development practices, CODEOWNERS enforcement, pre-submit validation queues, change impact analysis at Google/Meta/Stripe scale, toolchain coordination (Bazel, Buck remote execution), and ownership map enforcement for parallel work.
Sources: 38 gathered, consolidated, synthesized.
Scale forces pre-investment: Google's TAP pipeline executes more than 4 billion test cases per day across more than 50,000 change submissions—because in a no-branch monorepo with 25,000 engineers, a single bad commit can block the entire company simultaneously.[14][26] That constraint is not an edge case; it is the organizing principle behind every layer of hyperscaler monorepo infrastructure.
Every large-scale monorepo operator independently arrived at virtual or sparse filesystem presentation as the developer-experience foundation. Google's CitC FUSE layer stores fewer than 10 files locally per workspace on average—developers access the full 86 TB Piper repository without running a clone or sync, backed by 10 global data centers over Paxos consensus.[1][15] Meta's EdenFS uses lazy file fetching across Linux (FUSE), macOS (NFSv3), and Windows (ProjectedFS)—one daemon manages multiple concurrent checkouts per developer.[17][2] Microsoft's Windows repo at 300 GB and 3.5 million files was the world's largest Git repository at migration time; VFS for Git reduced clone from 12+ hours to minutes and git status from 10 minutes to 4–5 seconds.[35] By 2023, VFS for Git was in maintenance mode, replaced by Scalar—a no-virtualization approach built directly into Git v2.38 using sparse checkout and partial clone, with no kernel extension required.[24] The convergence: no large-scale monorepo ships full repository contents to developer machines.
Merge queues are the difference between a stable trunk and a permanently broken one. Before Uber deployed SubmitQueue, mainline was green only 52% of the time, and up to 10% of commits required reversion on worst days despite individually passing CI—because concurrent changes combined to produce failures neither triggered alone.[13] SubmitQueue brought mainline availability to 99%+. Its core innovation: a speculation engine with a 97% accurate ML predictor that schedules and runs builds concurrently, and an out-of-order landing mechanism for large-diff changes that proved safe across the full speculation tree—yielding a 74% reduction in wait time for large-diff authors.[13] GitHub's merge queue, reaching GA in 2023 after validation across 30,000+ PRs and 4.5 million CI runs, reduced average deploy wait time by 33% and doubled simultaneous deployment capacity from 15 to 30+ changes.[6][7] Teams implementing merge queues report an average 24% reduction in PR cycle times; without them, concurrent-commit rebase races produce unbounded retry cycles that make trunk health a probabilistic outcome.[20]
Specialized build systems with remote execution eliminate the computational scaling wall. Google's Blaze (operating since 2006) and Meta's Buck2 (rewritten in Rust, open-sourced 2023) treat remote execution as the default code path—local execution is a degenerate case, not the primary mode.[31][4] A shared action result cache means any build action already executed by any engineer returns immediately without re-execution. Uber quantified the compound effect: a Changed Target Calculator feeding only dependency-affected targets to Bazel remote execution cut a 10,000-target CI run from 60 minutes to 10 minutes (83% reduction) and average build time from 45 minutes to 14 minutes (69% reduction), while improving CI resource utilization from 23% to 78%.[33] Meta's Buck2 delivers 2× faster builds than Buck1 in production use, with engineers producing measurably more code as a direct result.[19] Stripe's selective test execution via Bazel reduces per-PR test scope to approximately 5% of the full suite—enabling their 50M-line Ruby codebase to sustain ~1,145 PR merges per day without proportional CI cost growth.[10][23]
Static analysis integrated into code review—not as a separate audit pass—drives behavioral change at commit time. Google's Tricorder platform processes the same >50,000 daily changes as TAP, running multiple analyses per second from 146 analyzers across 30+ languages with an effective false-positive rate below 5%.[22] Approximately 3,000 automated fixes are applied by code authors daily.[22] High-confidence checks are promoted directly into the compiler as build errors—no warnings exist in Google's Java compiler; a finding is either an error or it is suppressed. The acceptance bar for new checks requires: actionable fixes, a false-positive rate below 10%, and demonstrated quality impact before any check enters the platform.[22] This design—results surface in the diff viewer before submission—is why Tricorder achieves author adoption rather than bypass.
Code ownership enforcement introduces its own failure modes at scale. Google's OWNERS model—every directory contains an owner list; changes require owner approval before submission—directly inspired GitHub CODEOWNERS and GitLab's system, but the practice reveals predictable breakdowns at hyperscale.[38] Udemy's monorepo audit found 1 in 8 files had no owning team, meaning edits to those paths required no accountable review—a direct compliance and security gap.[12][25] Tech leads assigned as broad-area owners receive 20–30 review requests per day; changes crossing multiple team boundaries require 3–4 separate owner approvals, with primary-owner unavailability causing PR stalls.[21] The structural outcome: PRs in monorepos take an average of 19 hours to merge, versus 2 hours in polyrepos, with monorepos exhibiting greater variance reflecting tooling maturity differences.[5][29] Ownership mapped to org-chart hierarchies breaks during reorgs; ownership mapped to code boundaries and product areas remains stable through team structure changes.
Large-scale change automation makes previously permanent architectural decisions reversible. Google's Rosie infrastructure shards cross-cutting changes by project and OWNERS file boundaries, routes each shard to the appropriate owner with automatic escalation for non-responsive reviewers, and applies pattern-based auto-approval for conforming changes—enabling thousands of changes to be created, tested, reviewed, and submitted across the full codebase daily.[32] The consequence: symbol renames and library relocations that previously constituted irreversible decisions are now routine operations. Uber applied analogous discipline to deployment: analysis of 500,000 commits revealed that 1.4% impacted more than 100 services simultaneously and 0.3% impacted more than 1,000 services.[33] A tiered CD orchestration system staging from least-critical to most-critical services, halting on cross-service failure signals, reduced deployment safety incidents from 12 per month to 1.2 per month—a 90% reduction.[33]
Autonomous AI agents are already operating inside these systems. Stripe built "Minions"—four specialized agents (reader, writer, tester, reviewer) running concurrently inside their 50M-line Ruby monorepo—to complete development tasks end-to-end without sequential handoffs.[10][23] As of early 2025, this is one of the few publicly documented production deployments of autonomous parallel AI coding agents operating at large-scale monorepo conditions. The prerequisite infrastructure was identical to what enables human-scale coordination: hermetic devboxes, selective test execution, CODEOWNERS-gated review, and merge queues—all constraining and validating agent outputs with the same rigor applied to human commits.
The practical implication is that monorepo coordination is a compounding problem: each missing layer amplifies the cost of the others. The Uber case is instructive—three distinct investments (SubmitQueue, Bazel with CTC, and tiered CD orchestration) each independently produced 70–90% improvements in their respective dimensions. Organizations that adopt partial stacks—merge queues without selective test execution, or CODEOWNERS enforcement without merge queues—report the characteristic failure modes of whichever layer is absent. The sequence matters: virtual filesystem or sparse checkout eliminates the physical scale problem first; a remote-execution build system with shared caching eliminates redundant computation; a merge queue with speculation eliminates trunk instability; pre-submit validation gated on dependency graphs rather than full-suite execution eliminates CI cost growth; and ownership enforcement baked into the submission pipeline closes review accountability gaps. Sapling's foundational design principle—all operations must scale with files in use by a developer, not total repository size, producing sub-second Smartlog commands regardless of how large the repository grows—captures the generalized principle: every tool in the stack must decouple developer-perceived performance from aggregate repository scale.[2][16]
Google's Piper is the world's largest active monorepo system: as of 2016, it hosts 86 TB of data, 2 billion lines of code across 9 million source files, with a total history of 1 billion files across 35 million commits since 2000.[1][9] 95% of Google's developers—25,000 engineers globally—use Piper as their primary version control.[1][9] The system processes ~16,000 manual commits plus 24,000 automated bot commits daily, while serving billions of read requests per day.[1]
Key finding: Google's CitC model stores an average of fewer than 10 files per developer workspace locally—only modified files—while presenting an illusion of full-repo access via a cloud-backed FUSE filesystem, eliminating the need for any explicit sync or clone operation.[1][15]
| Dimension | Value | Source |
|---|---|---|
| Codebase size | 86 TB; 2 billion lines; 9 million files | [1][9] |
| Repository history | 1 billion files total; 35 million commits since 2000 | [9] |
| Active developer base | 25,000 engineers (95% of Google) | [1][9] |
| Daily manual commits | ~16,000 developer + ~24,000 automated bot | [1] |
| Weekly code change volume (2015) | ~15 million lines; ~250,000 files | [15] |
| Backend storage (original) | Bigtable, later migrated to Spanner | [1] |
| Geographic replication | 10 global data centers | [1][27] |
| Consensus protocol | Paxos for distributed coordination | [1][9] |
| Access control | ~99% of codebase readable by all engineers; <1% access-controlled | [1][15] |
| Code review gate | Mandatory approval via Critique before any submission | [1][15] |
CitC provides the primary developer workflow at Google: a cloud backend coupled with a local FUSE filesystem that creates an illusion of changes overlaid on top of the full repository.[1][9][27] Key operational properties:
| Tool | Function | Integration |
|---|---|---|
| Critique | Code review tool | Gates all commits; Tricorder results surface here[27] |
| CodeSearch | Code search and navigation | Browser editing via CitC integration[1] |
| Tricorder | Static analysis platform (>100 analyzers) | Runs on all changes; feeds Critique[22] |
| Rosie | Large-scale refactoring / automated code changes | Shards by OWNERS file; submits thousands/day[32] |
| TAP | Test Automation Platform | Gates presubmit; 4B+ test cases/day[14] |
| Blaze/Bazel | Build system (remote execution) | Maintains global dependency graph with Forge[31] |
| Forge | Remote execution for distributed builds | Distributor → Scheduler → shared action cache[31] |
Google enforces trunk-based development strictly: no personal development branches exist—branches are reserved exclusively for releases.[1][9][27] Feature flags substitute for branches, allowing old and new code to coexist in main without development isolation.[9] The entire audit trail is logged: accidentally committed files can be retroactively purged, though all operations are preserved.[15]
| Advantage | Trade-off / Challenge |
|---|---|
| Unified versioning — no dependency conflicts[9] | Significant ongoing effort for code health maintenance[9] |
| Atomic changes across multiple projects in one commit[9] | Code discovery becomes difficult at extreme scale[9] |
| No "diamond dependency problem"[15] | Eliminates "documentation culture" — teams expect others to read code[9] |
| Large-scale refactoring across all projects simultaneously[9] | External teams can depend on implementation details, blocking deprecation[9] |
| Flexible team boundaries via directory structure[9] | Culture side effect: broad code visibility can substitute for proper docs[9] |
Meta's source control infrastructure spans two primary systems: Sapling, a Git-compatible source control client developed over 10 years from a Mercurial fork, and EdenFS, a virtual filesystem designed specifically for massive monorepos.[2][16][28] The repository serves tens of thousands of internal developers, hosting tens of millions of files, commits, and branches—making it infeasible to break into polyrepos.[16][28] Sapling was open-sourced in November 2022.[2]
Key finding: Sapling's core design principle—all operations must scale with the number of files in use by a developer, not with total repository size—produces sub-second Smartlog commands regardless of how large the repository grows.[2][16]
| Feature | Mechanism | Performance Result |
|---|---|---|
| Segmented Changelog[2] | Downloads only high-level commit graph shape (megabytes, not full history) | Commit relationship queries in O(number-of-merges); Smartlog <1 second |
| File history queries[2][16] | Optimized log indexing | O(log n) instead of O(n) |
| Sparse checkouts[2] | Organizational "sparse profiles" auto-update when dependencies change | Developers only materialize files they actually use |
| Watchman integration[16] | File change monitoring accelerates sl status | Status computed without scanning working directory |
| Git LFS[16] | Large file storage offload | Prevents binary bloat in commit history |
| Feature | Commands | Purpose |
|---|---|---|
| Smartlog[2][16] | sl | Shows local commits, remote branches, changed files, outdated commits—filters irrelevant information |
| Error recovery[2][16] | sl undo, sl redo, sl hide, sl unhide, sl undo -i | Full undo/redo history with interactive navigation |
| Commit stacks[2][16] | sl goto, sl amend, sl restack, sl next/prev, sl fold, sl split | First-class stacked commits for incremental review |
| ReviewStack[16][28] | Web interface | Stack-oriented GitHub PR review companion |
| Web interface[16] | sl web | Browser-based repo navigation and editing |
EdenFS is Meta's virtual filesystem for massive monorepos. Its core design is lazy file fetching: only files actually accessed are downloaded; other files are fetched on-demand.[17][2] A single EdenFS daemon manages multiple simultaneous checkouts—critical for concurrent development workflows where a developer may need to work across multiple branches at once.[2][17][28]
| Platform | Filesystem Implementation | Notes |
|---|---|---|
| Linux[17] | FUSE (Filesystem in Userspace) | Primary deployment target |
| macOS[17] | FUSE for macOS or NFSv3 | Moving to NFS as Apple deprecates kernel extensions |
| Windows[17] | Microsoft's ProjectedFS | Same projection model as VFS for Git |
Each EdenFS checkout (EdenMount) consists of four subsystems:[17]
EdenFS prioritizes three operations above all others: status determination (computing modified files relative to source control), commit switching (checkout speed), and file change notifications via Watchman to build tools and IDEs.[17] Additionally, EdenFS provides file content hashing without requiring full file reads.[17]
| Component | Role | Relationship |
|---|---|---|
| Mononoke[8] | Highly scalable distributed source control server | Serves Sapling clients |
| EdenFS[17] | Virtual filesystem for repo access | Provides lazy-fetch working directory to Sapling |
| Sapling[2] | Source control client (Git-compatible) | Developer-facing CLI; open-sourced Nov 2022 |
Meta supports two branching workflows: non-mergeable full-repo branching, and mergeable directory branching.[2] Full-repo branching is not viable at Meta's scale for workflows requiring merging: merging full-repo branches creates merge commits with multiple parents, making the commit graph wide and non-linear. Maintaining a linear commit graph is critical for keeping all commit graph operations fast across tens of thousands of users.[2]
See also: Git Worktree MechanicsIn May 2017, Microsoft announced that virtually all Windows engineers use a single Git monorepo under the internal "One Engineering System" initiative.[11][24][35] The Windows monorepo, at ~300 GB storage and 3.5 million files, was the largest Git repository in the world at migration time.[11][24]
Key finding: VFS for Git reduced clone time from 12+ hours to a few minutes, checkout from 2–3 hours to 30 seconds, and status from 10 minutes to 4–5 seconds—without modifying the underlying Git protocol.[35]
| Dimension | VFS for Git (GVFS) | Scalar |
|---|---|---|
| Approach[24][35] | Virtual filesystem layer — OS presents all files as present; downloads on first read | No virtualization — sparse checkout + partial clone |
| Platform support[24][35] | Windows only (macOS port abandoned) | Cross-platform |
| Open source[35] | Released later, Windows-only | Open source from day one |
| Git integration[24] | External virtual filesystem layer | Built into Git v2.38 as native command |
| Current status[35] | Maintenance mode — critical security updates only | Recommended for all new deployments |
| Deployment complexity[24] | Requires kernel extensions / OS-level filesystem drivers | Significantly simpler — no filesystem virtualization |
scalar clone)Microsoft also identified edge cases where Git performed unnecessary work on large file sets and contributed fixes upstream; Scalar serves as a proving ground for improvements intended for mainline Git.[24]
Microsoft developed Rush.js, an open-source monorepo manager used across 250+ developers building Microsoft 365 frontend components, managing TypeScript monorepos at scale.[11]
Microsoft's "One Engineering System" philosophy requires all teams to agree on common standards, enforced via two mechanisms: automation (consistent, predictable rulesets for building, testing, approving contributions) and peer review (social coordination).[11]
See also: Git Worktree MechanicsTrunk-Based Development (TBD) is a source-control branching model where developers work in short-lived branches or directly in the trunk, with the mainline always in a deployable state.[3][18] A monorepo is TBD at its logical extreme: all source in one trunk, atomic commits, no long-lived parallel branches.[3] The 2024 DORA Accelerate State of DevOps Report (39,000+ professionals surveyed) identifies TBD as a required practice for continuous integration; the 2024 Puppet State of DevOps Report found 81% of top-performing IT teams use continuous delivery practices, yet only 19% of teams reached elite performance levels in 2024.[18][29]
Key finding: In documented case studies, moving to monorepo with trunk-based development enabled teams shipping once per month to ship weekly, and teams resolving 1–2 tickets/day to resolve 6–7 per day.[3]
| Organization | Scale | TBD Practice |
|---|---|---|
| Google[18][29] | 35,000 developers†; 2B+ LOC; ~40,000 commits/day | Single monorepo trunk; no personal branches; branches reserved for releases |
| Meta[29] | Tens of thousands of engineers | Continuous integration with TBD; daily mobile releases via feature flags |
| Uber[29] | 1,000+ commits/day; ~3,000 microservices | All production builds from main; organized by language monorepos |
| Pinterest[18][29] | 1,300+ repositories | 3-year migration into four monorepos paired with TBD |
| Netflix[3] | — | Uses trunk-based development |
† Figure from raw_18.md/raw_29.md includes QA automators; raw_1.md/raw_9.md cite 25,000 software engineers specifically. Both figures derive from different measurement periods and inclusion criteria.[1][18]
A typical PR in a monorepo takes 19 hours to merge, compared to 2 hours in a polyrepo.[5][29][30] Monorepos exhibit greater variability in PR cycle time, reflecting differences in tooling maturity and CODEOWNERS enforcement quality.[5]
| # | Mechanism | How it enables TBD at scale |
|---|---|---|
| 1 | Feature flags[3][18][29] | Incomplete features merged hidden behind flags; Meta ships daily mobile releases this way |
| 2 | Code ownership checks[18][29] | Programmatic review enforcement; a Cloud engineer cannot approve YouTube algorithm changes |
| 3 | Specialized build systems[18][29] | Bazel/Buck/Pants process dependency graphs and rebuild only affected code |
| 4 | Sparse checkouts[3][18] | Google: scripted checkout modification; Sapling: sparse profiles |
| 5 | Stacked PRs + merge queues[18] | Graphite-style stacks break large changes into smaller dependent branches with ordered queuing |
| 6 | Release branches[29] | Cut as snapshots for stabilization; bug fixes cherry-picked; new features excluded |
| 7 | Communication cadence[18] | Regular stand-ups ensure awareness of impactful trunk changes |
Monorepos trade high tooling complexity for low coordination cost; polyrepos trade low tooling complexity for high coordination cost (dependency management, versioning).[11] The threshold at which monorepo advantages outweigh tooling investment typically occurs when cross-repo dependency friction exceeds the cost of investing in build system and merge infrastructure.[11]
Bazel and Buck exist because traditional build tools (make, etc.) cannot handle large-scale, multi-language monorepos.[4][19][31] Google's internal Blaze (Bazel's predecessor) has operated since 2006 on the assumption of remote execution by default; Meta's Buck2 was rebuilt from scratch in Rust with the same philosophy.[4]
Key finding: Google runs millions of builds executing millions of test cases and producing petabytes of build outputs from billions of lines of source code every day—achieved via a shared action result cache where any build action already executed by any Google engineer is returned immediately without re-execution.[31]
| System | Origin | Open-sourced | Language | RE Protocol | Key Design Choice |
|---|---|---|---|---|---|
| Blaze[31] | Google, 2006 | No (internal) | Java | Custom (Forge) | Remote execution by default from inception |
| Bazel[31] | Google (Blaze OSS port) | March 2015 | Java | Custom RE API | Multi-phase execution model |
| Buck (v1)[4] | Meta/Facebook | March 2013 | Java | Custom | Local-first with optional RE |
| Buck2[4][19] | Meta (from-scratch rewrite) | 2023 | Rust | Bazel RE API (OSS) | Remote execution first; single dependency graph |
| Pants[12] | Open source | Python | RE supported | Large-scale multi-language monorepos | |
| Nx[12][29] | Community / Nrwl | Open source | TypeScript | Distributed cache | TypeScript/JS; DAG for affected targets only |
| Capability | Bazel + RE | Buck2 + RE |
|---|---|---|
| Compile parallelism[31] | Laptop compile runs on 96-core cloud machine | Same — local execution is a special case of remote |
| Test parallelism[31] | Up to 1024× parallel on compute cluster | Thousands of parallel actions with shared cache |
| Action caching[31] | Result returned immediately if any user already built it | Same shared cross-org cache |
| Test caching[4][31] | Identical inputs → second run skipped entirely | Same |
| Compatible RE backends[19] | EngFlow, BuildBarn, BuildBuddy, custom | EngFlow, BuildBarn, BuildBuddy (Bazel RE API) |
Uber's Go monorepo is among the largest Go repositories using Bazel and Gazelle (Bazel's official Go/Protobuf rule generator).[33] By implementing a Changed Target Calculator (CTC) that feeds selective targets to Bazel remote execution across separate machines, Uber achieved:[33]
| Metric | Before | After | Improvement |
|---|---|---|---|
| 10,000 target CI run[33] | 60 minutes | 10 minutes | 83% reduction |
| Average build time[33] | 45 minutes | 14 minutes | 69% reduction |
| CI resource utilization[33] | 23% | 78% | 3.4× efficiency improvement |
An emerging approach lifts Bazel's in-memory build graph (Skyframe) into a remote cluster backed by a distributed persistent cache. All graph nodes are stored remotely, making every build incremental—eliminating the "cold build" penalty entirely.[4][31]
Google pioneered the OWNERS file model: every directory contains a file listing responsible reviewers. Changes to a subtree must be approved by an owner of that subtree before submission.[38] This model directly inspired GitHub's CODEOWNERS feature, GitLab's code owner system, and adoption by large open-source projects including Chromium and Kubernetes.[38]
Key finding: Udemy's monorepo audit found 1 in 8 files had no owning team specified in CODEOWNERS—meaning edits to those files had no required review by accountable owners, representing a direct security and compliance gap at scale.[12][25]
| Dimension | Google OWNERS | GitHub CODEOWNERS |
|---|---|---|
| Mechanism[38] | OWNERS file per directory; Gerrit code-owners plugin enforces | Single CODEOWNERS file with glob patterns; enforced by branch protection |
| Review types required[38] | Two distinct approvals: (1) detailed code review for quality, (2) owner approval for appropriateness | One required review from matching owner |
| Access model[38] | Anyone can change any file; owner approval required to submit | PR blocked from merge until all matching owners approve |
| Bot support[38] | Bots can be OWNERS to automate process checks | Bots can be assigned as code owners |
| Build-level enforcement[38] | Bazel/Blaze visibility rules enforce package dependency ownership | Not enforced at build level |
| Challenge | Data Point | Source |
|---|---|---|
| Review load imbalance | Tech leads receive 20–30 review requests daily when assigned as broad-area owners | [21] |
| Cross-team changes | Changes spanning multiple team boundaries require 3–4 separate owner approvals | [21][25] |
| Notification noise | Teams with 10+ members trigger excessive interruptions from auto-assignment | [21] |
| Ownership drift | CODEOWNERS files grow stale as codebase evolves; enforcement becomes inconsistent | [21][25] |
| CODEOWNERS + Merge Queue conflict | Known GitHub bug: CODEOWNERS bypass does NOT work when merge queue is enabled | [5][30] |
| PR stalls | When primary owners are unavailable, PRs block—backup reviewers or round-robin needed | [21] |
| Coverage gaps | 1 in 8 files unowned at Udemy | [12][25] |
| Organization | Scale | Ownership Approach | Outcome |
|---|---|---|---|
| Rippling[12][25] | 700+ engineers; Python monorepo | Programmatic "Service Catalog" — services independent from org chart; one team may own multiple code areas | Per-team metric tracking (test runtime, flakiness), gamified leaderboards, data-driven roadmaps |
| Meta[25][38] | 50,000+ engineers | Phabricator with ownership-based review routing; 2020 research paper on ownership management challenges at scale | "Ownership at Large – Open Problems and Challenges in Ownership Management" (Ahlgren et al.) |
| Google[38] | 25,000+ engineers | OWNERS files + Gerrit plugin + Bazel visibility + Rosie automated shard routing | Global approvers can auto-approve all shards via pattern tooling |
enforce-module-boundaries lint rule prevents unauthorized cross-package imports.[12]When an OWNER is a bot, machine-readable ownership enables automated process checks: for example, if service B maintains a hard-coded authorized-client list, adding service A requires a review from B's owners—enforcing API access control via the ownership model itself.[38]
Effective ownership strategy goes beyond mapping directories to teams. Two key observations from practitioners:[21]
At Google, Meta, and Stripe scale, exhaustive test execution on every commit is computationally infeasible. A change in a widely used shared library can transitively affect hundreds of downstream applications—making selective test execution, dependency graph analysis, and progressive validation pipelines mandatory infrastructure.[8]
Key finding: When a single monorepo has no branches and 100,000+ developers, "the blast radius of a bad check-in is massive — there is a non-zero probability that your check-in will block thousands or tens of thousands of other engineers."[34] This constraint is why Google invests orders of magnitude more in pre-submit validation than Amazon does.
| TAP Metric | Value |
|---|---|
| Unique changes handled daily[14][26] | >50,000 |
| Individual test cases executed daily[14][26] | >4 billion |
| Change submission rate[26] | >1 per second |
| Average presubmit wait time[14][26] | ~11 minutes |
| Presubmit pass → full test pass likelihood[14] | >95% |
| Phase | Timing | Mechanism | Scope |
|---|---|---|---|
| Presubmit[14][26] | Runs during code review loop, before submission | Static build-dependency test selection + ML-driven selection + ML flakiness mitigation | Fast unit test subset; flaky tests excluded; global presubmit for widely-used libraries |
| Post-Submit[14][26] | Asynchronous, after submission | TAP runs all potentially affected tests including large/slow tests; auto-bisects failing batches | Full affected test suite; automatic rollback when culprit identified with high confidence |
A behavioral side effect: the visible difference between triggering 100 vs. 1,000 downstream tests incentivizes engineers to make smaller, more targeted changes, reducing blast radius organically.[26]
Each Google team designates a Build Cop responsible for maintaining the health of their test suite. When TAP's post-submit run detects failures, it automatically bisects failing batches—splitting them and rerunning each change in isolation to identify the culprit. When a culprit is identified with high confidence, TAP supports automatic rollback without requiring manual intervention.[26]
The cultural norm reinforcing this system: "Rolling a change back is often the fastest and safest route to fix a build." This rollback-first culture, combined with automated bisection and the designated Build Cop role, means failures are isolated and remediated quickly rather than compounding.[26]
| Case Study | Intervention | Measured Outcome |
|---|---|---|
| Google Assistant[26] | Transition to hermetic presubmit (version-hermetic, no network calls) | 14× runtime reduction; virtually zero flakiness |
| Google Takeout — broken servers[26] | Hermetic presubmit (integrates 90+ Google products) | Prevented 95% of broken servers from bad configuration |
| Google Takeout — deployment failures[26] | Hermetic presubmit | Nightly deployment failures reduced 50% |
| Google Takeout — culprit detection[26] | E2E tests from nightly to every 2 hours | 12× reduction in culprit set size |
| Google Takeout — debug burden[26] | Refactored test suites | 35% reduction in Takeout team debugging involvement |
| Tricorder Metric | Value |
|---|---|
| Code review changes processed daily[22] | >50,000 |
| Analysis rate[22] | Multiple analyses per second |
| Total analyzers (as of Jan 2018)[22] | 146 (125 contributed from outside Tricorder team) |
| Languages supported[22] | 30+ |
| Effective false-positive rate[22] | <5% |
| Automated fixes applied daily[22] | ~3,000 by authors |
Tricorder architecture: microservices model that sends analysis requests to dedicated servers alongside change metadata; servers access source via a FUSE-based filesystem; results surface directly in the Critique diff viewer.[22] The platform enforces two critical impact checks: a warning when a changelist will transitively affect a large percentage of the codebase, and a warning when a changelist needs merging with HEAD.[22]
High-confidence analyses are promoted into compilers as build errors—Error Prone 'ERROR' checks are enabled in Google's Java compiler. Google treats checks as either build-breaking errors or suppressions; no compiler warnings exist.[22]
Tricorder enforces four quality standards before accepting new analyzer checks into the platform:[22]
| Standard | Requirement |
|---|---|
| Understandable outputs | Results must be accessible and meaningful to any engineer, not just the check author |
| Actionable fixes | Each finding must include implementation guidance so the author knows exactly what to change |
| False positive rate | Effective false positive rate must stay below 10% to avoid alert fatigue |
| Demonstrated impact | Check must show significant positive impact on code quality before acceptance into the platform |
Stripe's 50M-line Ruby monorepo (the largest known Ruby codebase) ships approximately 1,145 pull requests per day via GitHub Enterprise Server.[23][36] Their selective test execution system—using Bazel and custom CI infrastructure—runs only ~5% of tests on average per PR, enabling continued scaling of both personnel and codebase without proportional CI cost growth.[10][23]
These two hyperscalers represent opposite ends of the CI investment spectrum:[34]
A related problem Google terms mid-air collisions: two changes modifying completely separate files can still cause a test failure when their effects combine at runtime. Google addresses this via aggressive dependency analysis and selective test execution rather than serializing all submissions—the dependency graph catches indirect interactions that file-level diff analysis misses.[26]
See also: Scope Overlap DetectionWithout a merge queue, large monorepo teams face the "merge race": multiple developers complete work simultaneously, all attempt to merge, and broken builds cascade. In monorepos where everything is connected and build times are long, this produces unbounded retry cycles.[6][20][37] Teams implementing automated merge queues report an average 24% reduction in PR cycle times; well-tuned queues are estimated to save ~$750K annually for 20-developer teams (according to Aviator, a merge queue vendor).[20]
Key finding: Before Uber's SubmitQueue, mainline was green only 52% of the time and 10% of commits on worst days required reversion—despite passing CI—due to rebase conflicts. After SubmitQueue, mainlines remained green 99%+ of the time.[13]
| Dimension | Value |
|---|---|
| Engineers served[13] | 4,500+ across global development centers |
| Monorepos covered[13] | 6 major monorepos, 7 programming languages |
| Codebase size[13] | Hundreds of millions of lines of code |
| ML model accuracy[13] | 97% (predicts change success, build time, scheduling) |
| Large diff bypass improvement[13] | 74% improvement in wait time to land code |
| Mainline green rate before[13] | 52% |
| Mainline green rate after[13] | 99%+ |
When a change affects a large repository portion, it conflicts with most subsequent queued items, creating a sequential bottleneck. Uber proved it is safe to land large changes out of queue order when all branches in the speculation tree produce the same outcome, yielding a 74% reduction in wait time to land large-diff changes.[13]
| Milestone | Date | Detail |
|---|---|---|
| Trains introduced[6] | 2016 | Special PRs grouping multiple changes for simultaneous testing |
| Trains recognized as bottleneck[6] | 2020 | Limiting velocity; merge queue project initiated |
| Internal testing[6] | Mid-2021 | Small internal repos begin testing |
| Full production migration[6] | 2023 | Large monorepo + all production service repos; GA released |
| GitHub Merge Queue Metric | Value |
|---|---|
| Engineers using[6] | 500+ |
| PRs per month[6] | 2,500 |
| Deploy time reduction[6] | 33% average wait time reduction |
| Simultaneous deployment capacity[6] | 15 → 30+ changes at once |
| PRs used in development/testing[7] | 30,000+ |
| CI runs during development[7] | 4.5 million |
GitHub's merge queue was built around three explicit design principles:[7]
The queue creates temporary test branches combining current main with queued PR changes, runs required checks on temp branches before merging anything to main.[37] With many concurrent PRs, branch combinations grow combinatorially (3 PRs → up to 6 branch combinations). Optimization requires a fine-grained model of project-level dependencies—GitHub developed a language-agnostic project-impact-graph.yaml specification to encode which projects affect which others, enabling the merge queue to skip redundant cross-project test combinations.[37]
Outschool grew from 20 to 50 engineers in 2021. With 30-minute build+deploy times, queue depth grew to 6–8 developers on average (3 hours wait), occasionally 12—with engineers waking at 6am to reserve slots.[37] After enabling GitLab Merge Trains (up to 20 concurrent pipelines): 400 merges in a 10-hour window, eliminating manual Slack queue coordination.[37]
GitLab CI/CD optimization: parent-child pipelines with rules:changes run pipelines only when specified paths change, preventing full pipeline triggers for unrelated commits.[20]
| Platform | Model | Scale Evidence | Key Differentiator |
|---|---|---|---|
| Uber SubmitQueue[13] | Speculation + ML scheduling | 4,500 engineers; 6 monorepos | 97% ML predictor; out-of-order landing for large diffs |
| GitHub Merge Queue[6][37] | Temp branch batching | 30,000+ PRs; 4.5M CI runs in testing | project-impact-graph.yaml for dependency-aware batching |
| GitLab Merge Trains[37] | Sequential simulation | Up to 20 concurrent pipelines | 400 merges/10hr window in real production |
| Graphite[6] | Stacked PRs + batching | — | Dashboard + CI parallelism + stacked diff support |
| Mergify[6] | Rule-based triggers | — | Flexible rule engine for merge conditions |
Merge queues evolved from side-project bots—Bors and Homu, used in the Rust project to normalize the serial-merge pattern—into standard platform features now offered by GitHub, GitLab, and dedicated vendors.[6]
See also: Git Worktree MechanicsAt monorepo scale, cross-cutting refactors—renaming a widely-used symbol, upgrading a shared library, changing an RPC interface—can affect thousands of files and hundreds of services simultaneously. Without automated orchestration, such changes are either avoided (accumulating technical debt) or create catastrophic incidents.[32][33]
Key finding: Google's Rosie infrastructure enables "technical decisions that used to be permanent — names of widely used symbols, locations of popular classes — [to be] now reversible." Thousands of changes are created, tested, reviewed, and submitted across all of Google's codebase per day via Rosie.[32]
Rosie takes a large change (human-authored or automated) and shards it by project and OWNERS file boundaries into independently submittable changes.[32]
| Step | Mechanism | Detail |
|---|---|---|
| 1. Generate[32] | Human or automated tooling | sed, clang tools, custom scripts, or Rosie-native transforms |
| 2. Shard[32][38] | Rosie splits by project/OWNERS boundaries | Owners detection service weights each owner by expected review availability |
| 3. Test[32] | Each shard through independent test pipeline | Runs at lower priority; caps outstanding shards; communicates infrastructure load |
| 4. Review[32] | Owner review or pattern-based auto-approval | Global approvers examine only anomalous cases (conflicts, tooling malfunctions) |
| 5. Submit[32] | Rosie submits each approved shard atomically | Unresponsive owners → additional reviewers added automatically |
For large consistent refactors, global reviewers configure pattern-based tooling to automatically approve conforming changes. Only anomalous cases (merge conflicts, tooling failures, unexpected patterns) require human review.[32] Global approvers can approve all shards in a LSC instead of routing to individual directory owners.[38]
Rosie-enabled API migrations follow a three-step pattern:[32]
Uber's Go monorepo sees 1,000+ commits per day sourcing ~3,000 microservices. Analysis of 500,000 commits revealed:[33]
Without intervention, CD pipelines push all these changes to production immediately—a single buggy high-impact commit can break thousands of services in parallel.[33]
| Component | Function |
|---|---|
| State machine[33] | Lightweight, asynchronous; tracks deployment state per commit across all impacted services |
| Periodic jobs[33] | Track deployment outcomes across all affected services; progress gates on success/failure thresholds per stage |
| Service tiering[33] | Tier 0 (most critical) to Tier 5; less critical services deploy first; success at each tier enables progression |
| Failure propagation prevention[33] | Cross-service failure signals halt progression before production blast radius expands |
Result: Deployment safety incidents reduced from 12/month to 1.2/month—a 90% reduction.[33]
Uber's DevPod allows developers to build in the cloud using faster machines, with all monorepo tooling in a secure, controlled environment. DevPods run as containers on a Kubernetes cluster with necessary tooling and compute resources.[33]
At monorepo scale, individual developer machines cannot practically store or operate on the full repository. Google (CitC), Meta (EdenFS), and Microsoft (Scalar) each solve this with virtual filesystem or sparse checkout approaches. Stripe and Uber represent the cloud devbox model—ephemeral, centrally-provisioned development machines that offload storage and compute to data centers.[10][33][36]
| Era | Model | Coordination Overhead |
|---|---|---|
| Early[36] | Shared EC2 instances coordinated via Slack | "Is anyone else using the API server?" — manual, blocking |
| Middle[36] | Local laptop development | Environment drift; expensive hardware requirements |
| Current[10][36] | Centrally-provisioned ephemeral cloud devboxes | "Pull master and re-create your devbox" resolves most problems |
pay CLI: Unified developer tool for devbox interactions, test execution, and monorepo workflows.[36]Stripe built "Minions"—fully autonomous AI coding agents operating inside their 50M-line Ruby monorepo. Four specialized agents run concurrently to complete development tasks end-to-end:[10][23]
All four agents run in parallel, producing production-ready code end-to-end without sequential handoffs. As of early 2025, this is one of the few publicly documented production deployments of autonomous parallel AI coding agents in a large-scale monorepo environment.[10][23]
"When a single engineer can run ten tasks at once, the definition of productivity changes permanently."[23]
| Organization | Model | Technology | Key Property |
|---|---|---|---|
| Google[1][9] | Cloud-backed virtual filesystem (CitC) | FUSE + Piper/Spanner | <10 files local on average; full repo accessible instantly |
| Meta[17] | Virtual filesystem (EdenFS) | FUSE/NFS/ProjectedFS + Mononoke | Lazy fetch; multiple concurrent checkouts per daemon |
| Microsoft[24] | Sparse checkout (Scalar) | Git v2.38 built-in | No virtualization layer; partial clone + explicit sparse file list |
| Stripe[10][36] | Ephemeral cloud devboxes | Centrally-provisioned VMs; pay CLI | Centralized maintenance; "recreate devbox" as standard fix |
| Uber[33] | Cloud development containers (DevPod) | Kubernetes-hosted containers | Faster machines; full tooling access; secure controlled environment |
| Organization | Repo Size | Engineers | Daily Commits | CI Scale |
|---|---|---|---|---|
| Google[1][9][14] | 86 TB; 2B LOC; 9M files | 25,000 (95% of Google) | ~40,000 (16K manual + 24K bots) | >4B test cases/day; >50K changes/day via TAP |
| Meta[2][16] | Tens of millions of files | Tens of thousands | Daily mobile releases | Mononoke + EdenFS + Sapling |
| Microsoft[11][24] | ~300 GB; 3.5M files (Windows) | Thousands (Windows division) | — | Scalar + partial clone |
| Stripe[36][23] | 50M LOC (Ruby) | — | ~1,145 PRs merged/day | Bazel + GHE; ~5% tests per PR |
| Uber[13][33] | Hundreds of millions LOC; 6 monorepos | 4,500+ | 1,000+ (Go monorepo alone) | SubmitQueue (97% ML); ~3,000 microservices |
| Layer | Meta | Microsoft | Stripe | Uber | |
|---|---|---|---|---|---|
| VCS[1][2][24][36][13] | Piper (custom, Spanner-backed) | Sapling + Mononoke | Git + Scalar | Git + GHE | Git |
| Virtual FS[1][17][24] | CitC (FUSE) | EdenFS (FUSE/NFS/ProjectedFS) | VFS for Git (deprecated) → Scalar | None (ephemeral devboxes) | None (DevPod containers) |
| Build[31][19][10][33] | Blaze + Forge (RE) | Buck2 (Rust, RE-first) | — | Bazel | Bazel + Gazelle |
| Code review[1][25][36] | Critique + OWNERS | Phabricator + ownership routing | GitHub PRs + CODEOWNERS | GHE + CODEOWNERS | GitHub + SubmitQueue |
| Static analysis[22] | Tricorder (146 analyzers) | Custom + Phabricator ownership routing; Ahlgren et al. 2020 (ownership at scale)[25][38] | — | — | — |
| Merge control[6][13] | TAP (pre/post-submit gates) | Custom CI + feature flags | GitHub Merge Queue | GHE CI | SubmitQueue (speculation + ML) |
| LSC automation[32] | Rosie (shard by OWNERS) | — | — | — | Deployment orchestration (tiered) |
Microsoft and Stripe static analysis tooling not documented in corpus sources; cells left blank.
| Intervention | Organization | Before | After | Improvement |
|---|---|---|---|---|
| SubmitQueue adoption[13] | Uber | Mainline green 52% of time; 10% commits reverted on worst days | 99%+ green mainline | Near-elimination of merge failures |
| Merge queue (GitHub)[6] | GitHub | 15 simultaneous deployments | 30+ simultaneous | 33% reduction in average deploy wait time |
| Bazel + CTC (CI)[33] | Uber | 60 min for 10K targets; 45 min avg build | 10 min; 14 min avg | 83% / 69% reduction |
| Hermetic presubmit[26] | Google (Assistant) | Slow, flaky presubmit | 14× faster; near-zero flakiness | 14× runtime reduction |
| Tiered CD orchestration[33] | Uber | 12 deployment safety incidents/month | 1.2 incidents/month | 90% reduction |
| VFS for Git[35] | Microsoft | Clone: 12+ hr; checkout: 2–3 hr; status: 10 min | Clone: minutes; checkout: 30 sec; status: 4–5 sec | Orders of magnitude on all dimensions |
| Selective test execution[10] | Stripe | All tests per PR | ~5% tests per PR | 95% test reduction while maintaining quality |
| Buck2 vs. Buck1[19] | Meta | Buck1 build times | 2× faster | 50% build time reduction |