Home

Monorepo Coordination at Scale

Pillar: monorepo-tooling | Date: May 2026
Scope: How large organizations coordinate thousands of engineers in single repos: Google's Piper/CitC virtual filesystem, Meta's Sapling/Eden, trunk-based development practices, CODEOWNERS enforcement, pre-submit validation queues, change impact analysis at Google/Meta/Stripe scale, toolchain coordination (Bazel, Buck remote execution), and ownership map enforcement for parallel work.
Sources: 38 gathered, consolidated, synthesized.

Executive Summary

Scale forces pre-investment: Google's TAP pipeline executes more than 4 billion test cases per day across more than 50,000 change submissions—because in a no-branch monorepo with 25,000 engineers, a single bad commit can block the entire company simultaneously.[14][26] That constraint is not an edge case; it is the organizing principle behind every layer of hyperscaler monorepo infrastructure.

Every large-scale monorepo operator independently arrived at virtual or sparse filesystem presentation as the developer-experience foundation. Google's CitC FUSE layer stores fewer than 10 files locally per workspace on average—developers access the full 86 TB Piper repository without running a clone or sync, backed by 10 global data centers over Paxos consensus.[1][15] Meta's EdenFS uses lazy file fetching across Linux (FUSE), macOS (NFSv3), and Windows (ProjectedFS)—one daemon manages multiple concurrent checkouts per developer.[17][2] Microsoft's Windows repo at 300 GB and 3.5 million files was the world's largest Git repository at migration time; VFS for Git reduced clone from 12+ hours to minutes and git status from 10 minutes to 4–5 seconds.[35] By 2023, VFS for Git was in maintenance mode, replaced by Scalar—a no-virtualization approach built directly into Git v2.38 using sparse checkout and partial clone, with no kernel extension required.[24] The convergence: no large-scale monorepo ships full repository contents to developer machines.

Merge queues are the difference between a stable trunk and a permanently broken one. Before Uber deployed SubmitQueue, mainline was green only 52% of the time, and up to 10% of commits required reversion on worst days despite individually passing CI—because concurrent changes combined to produce failures neither triggered alone.[13] SubmitQueue brought mainline availability to 99%+. Its core innovation: a speculation engine with a 97% accurate ML predictor that schedules and runs builds concurrently, and an out-of-order landing mechanism for large-diff changes that proved safe across the full speculation tree—yielding a 74% reduction in wait time for large-diff authors.[13] GitHub's merge queue, reaching GA in 2023 after validation across 30,000+ PRs and 4.5 million CI runs, reduced average deploy wait time by 33% and doubled simultaneous deployment capacity from 15 to 30+ changes.[6][7] Teams implementing merge queues report an average 24% reduction in PR cycle times; without them, concurrent-commit rebase races produce unbounded retry cycles that make trunk health a probabilistic outcome.[20]

Specialized build systems with remote execution eliminate the computational scaling wall. Google's Blaze (operating since 2006) and Meta's Buck2 (rewritten in Rust, open-sourced 2023) treat remote execution as the default code path—local execution is a degenerate case, not the primary mode.[31][4] A shared action result cache means any build action already executed by any engineer returns immediately without re-execution. Uber quantified the compound effect: a Changed Target Calculator feeding only dependency-affected targets to Bazel remote execution cut a 10,000-target CI run from 60 minutes to 10 minutes (83% reduction) and average build time from 45 minutes to 14 minutes (69% reduction), while improving CI resource utilization from 23% to 78%.[33] Meta's Buck2 delivers 2× faster builds than Buck1 in production use, with engineers producing measurably more code as a direct result.[19] Stripe's selective test execution via Bazel reduces per-PR test scope to approximately 5% of the full suite—enabling their 50M-line Ruby codebase to sustain ~1,145 PR merges per day without proportional CI cost growth.[10][23]

Static analysis integrated into code review—not as a separate audit pass—drives behavioral change at commit time. Google's Tricorder platform processes the same >50,000 daily changes as TAP, running multiple analyses per second from 146 analyzers across 30+ languages with an effective false-positive rate below 5%.[22] Approximately 3,000 automated fixes are applied by code authors daily.[22] High-confidence checks are promoted directly into the compiler as build errors—no warnings exist in Google's Java compiler; a finding is either an error or it is suppressed. The acceptance bar for new checks requires: actionable fixes, a false-positive rate below 10%, and demonstrated quality impact before any check enters the platform.[22] This design—results surface in the diff viewer before submission—is why Tricorder achieves author adoption rather than bypass.

Code ownership enforcement introduces its own failure modes at scale. Google's OWNERS model—every directory contains an owner list; changes require owner approval before submission—directly inspired GitHub CODEOWNERS and GitLab's system, but the practice reveals predictable breakdowns at hyperscale.[38] Udemy's monorepo audit found 1 in 8 files had no owning team, meaning edits to those paths required no accountable review—a direct compliance and security gap.[12][25] Tech leads assigned as broad-area owners receive 20–30 review requests per day; changes crossing multiple team boundaries require 3–4 separate owner approvals, with primary-owner unavailability causing PR stalls.[21] The structural outcome: PRs in monorepos take an average of 19 hours to merge, versus 2 hours in polyrepos, with monorepos exhibiting greater variance reflecting tooling maturity differences.[5][29] Ownership mapped to org-chart hierarchies breaks during reorgs; ownership mapped to code boundaries and product areas remains stable through team structure changes.

Large-scale change automation makes previously permanent architectural decisions reversible. Google's Rosie infrastructure shards cross-cutting changes by project and OWNERS file boundaries, routes each shard to the appropriate owner with automatic escalation for non-responsive reviewers, and applies pattern-based auto-approval for conforming changes—enabling thousands of changes to be created, tested, reviewed, and submitted across the full codebase daily.[32] The consequence: symbol renames and library relocations that previously constituted irreversible decisions are now routine operations. Uber applied analogous discipline to deployment: analysis of 500,000 commits revealed that 1.4% impacted more than 100 services simultaneously and 0.3% impacted more than 1,000 services.[33] A tiered CD orchestration system staging from least-critical to most-critical services, halting on cross-service failure signals, reduced deployment safety incidents from 12 per month to 1.2 per month—a 90% reduction.[33]

Autonomous AI agents are already operating inside these systems. Stripe built "Minions"—four specialized agents (reader, writer, tester, reviewer) running concurrently inside their 50M-line Ruby monorepo—to complete development tasks end-to-end without sequential handoffs.[10][23] As of early 2025, this is one of the few publicly documented production deployments of autonomous parallel AI coding agents operating at large-scale monorepo conditions. The prerequisite infrastructure was identical to what enables human-scale coordination: hermetic devboxes, selective test execution, CODEOWNERS-gated review, and merge queues—all constraining and validating agent outputs with the same rigor applied to human commits.

The practical implication is that monorepo coordination is a compounding problem: each missing layer amplifies the cost of the others. The Uber case is instructive—three distinct investments (SubmitQueue, Bazel with CTC, and tiered CD orchestration) each independently produced 70–90% improvements in their respective dimensions. Organizations that adopt partial stacks—merge queues without selective test execution, or CODEOWNERS enforcement without merge queues—report the characteristic failure modes of whichever layer is absent. The sequence matters: virtual filesystem or sparse checkout eliminates the physical scale problem first; a remote-execution build system with shared caching eliminates redundant computation; a merge queue with speculation eliminates trunk instability; pre-submit validation gated on dependency graphs rather than full-suite execution eliminates CI cost growth; and ownership enforcement baked into the submission pipeline closes review accountability gaps. Sapling's foundational design principle—all operations must scale with files in use by a developer, not total repository size, producing sub-second Smartlog commands regardless of how large the repository grows—captures the generalized principle: every tool in the stack must decouple developer-perceived performance from aggregate repository scale.[2][16]



Table of Contents

  1. Google's Piper and Clients in the Cloud (CitC)
  2. Meta's Sapling and EdenFS Virtual Filesystem
  3. Microsoft: VFS for Git and Scalar
  4. Trunk-Based Development at Scale
  5. Build Systems: Bazel, Buck2, and Remote Execution
  6. Code Ownership Enforcement: OWNERS and CODEOWNERS
  7. Pre-Submit Validation and Change Impact Analysis
  8. Merge Queues and Submit Queues
  9. Large-Scale Change Automation and Rollout Control
  10. Developer Environment Tooling
  11. Cross-Organization Comparison

Section 1: Google's Piper and Clients in the Cloud (CitC)

Google's Piper is the world's largest active monorepo system: as of 2016, it hosts 86 TB of data, 2 billion lines of code across 9 million source files, with a total history of 1 billion files across 35 million commits since 2000.[1][9] 95% of Google's developers—25,000 engineers globally—use Piper as their primary version control.[1][9] The system processes ~16,000 manual commits plus 24,000 automated bot commits daily, while serving billions of read requests per day.[1]

Key finding: Google's CitC model stores an average of fewer than 10 files per developer workspace locally—only modified files—while presenting an illusion of full-repo access via a cloud-backed FUSE filesystem, eliminating the need for any explicit sync or clone operation.[1][15]

Piper Infrastructure Specifications

DimensionValueSource
Codebase size86 TB; 2 billion lines; 9 million files[1][9]
Repository history1 billion files total; 35 million commits since 2000[9]
Active developer base25,000 engineers (95% of Google)[1][9]
Daily manual commits~16,000 developer + ~24,000 automated bot[1]
Weekly code change volume (2015)~15 million lines; ~250,000 files[15]
Backend storage (original)Bigtable, later migrated to Spanner[1]
Geographic replication10 global data centers[1][27]
Consensus protocolPaxos for distributed coordination[1][9]
Access control~99% of codebase readable by all engineers; <1% access-controlled[1][15]
Code review gateMandatory approval via Critique before any submission[1][15]

Clients in the Cloud (CitC): Virtual Filesystem Model

CitC provides the primary developer workflow at Google: a cloud backend coupled with a local FUSE filesystem that creates an illusion of changes overlaid on top of the full repository.[1][9][27] Key operational properties:

Google's Complementary Tool Ecosystem

ToolFunctionIntegration
CritiqueCode review toolGates all commits; Tricorder results surface here[27]
CodeSearchCode search and navigationBrowser editing via CitC integration[1]
TricorderStatic analysis platform (>100 analyzers)Runs on all changes; feeds Critique[22]
RosieLarge-scale refactoring / automated code changesShards by OWNERS file; submits thousands/day[32]
TAPTest Automation PlatformGates presubmit; 4B+ test cases/day[14]
Blaze/BazelBuild system (remote execution)Maintains global dependency graph with Forge[31]
ForgeRemote execution for distributed buildsDistributor → Scheduler → shared action cache[31]

Development Model Constraints

Google enforces trunk-based development strictly: no personal development branches exist—branches are reserved exclusively for releases.[1][9][27] Feature flags substitute for branches, allowing old and new code to coexist in main without development isolation.[9] The entire audit trail is logged: accidentally committed files can be retroactively purged, though all operations are preserved.[15]

Documented Monorepo Advantages and Trade-offs

AdvantageTrade-off / Challenge
Unified versioning — no dependency conflicts[9]Significant ongoing effort for code health maintenance[9]
Atomic changes across multiple projects in one commit[9]Code discovery becomes difficult at extreme scale[9]
No "diamond dependency problem"[15]Eliminates "documentation culture" — teams expect others to read code[9]
Large-scale refactoring across all projects simultaneously[9]External teams can depend on implementation details, blocking deprecation[9]
Flexible team boundaries via directory structure[9]Culture side effect: broad code visibility can substitute for proper docs[9]
See also: Git Worktree Mechanics

Section 2: Meta's Sapling and EdenFS Virtual Filesystem

Meta's source control infrastructure spans two primary systems: Sapling, a Git-compatible source control client developed over 10 years from a Mercurial fork, and EdenFS, a virtual filesystem designed specifically for massive monorepos.[2][16][28] The repository serves tens of thousands of internal developers, hosting tens of millions of files, commits, and branches—making it infeasible to break into polyrepos.[16][28] Sapling was open-sourced in November 2022.[2]

Key finding: Sapling's core design principle—all operations must scale with the number of files in use by a developer, not with total repository size—produces sub-second Smartlog commands regardless of how large the repository grows.[2][16]

Sapling Scalability Architecture

FeatureMechanismPerformance Result
Segmented Changelog[2]Downloads only high-level commit graph shape (megabytes, not full history)Commit relationship queries in O(number-of-merges); Smartlog <1 second
File history queries[2][16]Optimized log indexingO(log n) instead of O(n)
Sparse checkouts[2]Organizational "sparse profiles" auto-update when dependencies changeDevelopers only materialize files they actually use
Watchman integration[16]File change monitoring accelerates sl statusStatus computed without scanning working directory
Git LFS[16]Large file storage offloadPrevents binary bloat in commit history

Sapling Developer UX Features

FeatureCommandsPurpose
Smartlog[2][16]slShows local commits, remote branches, changed files, outdated commits—filters irrelevant information
Error recovery[2][16]sl undo, sl redo, sl hide, sl unhide, sl undo -iFull undo/redo history with interactive navigation
Commit stacks[2][16]sl goto, sl amend, sl restack, sl next/prev, sl fold, sl splitFirst-class stacked commits for incremental review
ReviewStack[16][28]Web interfaceStack-oriented GitHub PR review companion
Web interface[16]sl webBrowser-based repo navigation and editing

EdenFS Virtual Filesystem Architecture

EdenFS is Meta's virtual filesystem for massive monorepos. Its core design is lazy file fetching: only files actually accessed are downloaded; other files are fetched on-demand.[17][2] A single EdenFS daemon manages multiple simultaneous checkouts—critical for concurrent development workflows where a developer may need to work across multiple branches at once.[2][17][28]

PlatformFilesystem ImplementationNotes
Linux[17]FUSE (Filesystem in Userspace)Primary deployment target
macOS[17]FUSE for macOS or NFSv3Moving to NFS as Apple deprecates kernel extensions
Windows[17]Microsoft's ProjectedFSSame projection model as VFS for Git

EdenMount Per-Checkout Components

Each EdenFS checkout (EdenMount) consists of four subsystems:[17]

EdenFS prioritizes three operations above all others: status determination (computing modified files relative to source control), commit switching (checkout speed), and file change notifications via Watchman to build tools and IDEs.[17] Additionally, EdenFS provides file content hashing without requiring full file reads.[17]

Meta's Monorepo Infrastructure Stack

ComponentRoleRelationship
Mononoke[8]Highly scalable distributed source control serverServes Sapling clients
EdenFS[17]Virtual filesystem for repo accessProvides lazy-fetch working directory to Sapling
Sapling[2]Source control client (Git-compatible)Developer-facing CLI; open-sourced Nov 2022

Branching Strategy Constraints

Meta supports two branching workflows: non-mergeable full-repo branching, and mergeable directory branching.[2] Full-repo branching is not viable at Meta's scale for workflows requiring merging: merging full-repo branches creates merge commits with multiple parents, making the commit graph wide and non-linear. Maintaining a linear commit graph is critical for keeping all commit graph operations fast across tens of thousands of users.[2]

See also: Git Worktree Mechanics

Section 3: Microsoft: VFS for Git and Scalar

In May 2017, Microsoft announced that virtually all Windows engineers use a single Git monorepo under the internal "One Engineering System" initiative.[11][24][35] The Windows monorepo, at ~300 GB storage and 3.5 million files, was the largest Git repository in the world at migration time.[11][24]

Key finding: VFS for Git reduced clone time from 12+ hours to a few minutes, checkout from 2–3 hours to 30 seconds, and status from 10 minutes to 4–5 seconds—without modifying the underlying Git protocol.[35]

VFS for Git vs. Scalar: Feature and Strategy Comparison

DimensionVFS for Git (GVFS)Scalar
Approach[24][35]Virtual filesystem layer — OS presents all files as present; downloads on first readNo virtualization — sparse checkout + partial clone
Platform support[24][35]Windows only (macOS port abandoned)Cross-platform
Open source[35]Released later, Windows-onlyOpen source from day one
Git integration[24]External virtual filesystem layerBuilt into Git v2.38 as native command
Current status[35]Maintenance mode — critical security updates onlyRecommended for all new deployments
Deployment complexity[24]Requires kernel extensions / OS-level filesystem driversSignificantly simpler — no filesystem virtualization

Scalar Key Features (via scalar clone)

Microsoft also identified edge cases where Git performed unnecessary work on large file sets and contributed fixes upstream; Scalar serves as a proving ground for improvements intended for mainline Git.[24]

Rush.js: JavaScript Monorepo Manager

Microsoft developed Rush.js, an open-source monorepo manager used across 250+ developers building Microsoft 365 frontend components, managing TypeScript monorepos at scale.[11]

Microsoft's Core Coordination Principle

Microsoft's "One Engineering System" philosophy requires all teams to agree on common standards, enforced via two mechanisms: automation (consistent, predictable rulesets for building, testing, approving contributions) and peer review (social coordination).[11]

See also: Git Worktree Mechanics

Section 4: Trunk-Based Development at Scale

Trunk-Based Development (TBD) is a source-control branching model where developers work in short-lived branches or directly in the trunk, with the mainline always in a deployable state.[3][18] A monorepo is TBD at its logical extreme: all source in one trunk, atomic commits, no long-lived parallel branches.[3] The 2024 DORA Accelerate State of DevOps Report (39,000+ professionals surveyed) identifies TBD as a required practice for continuous integration; the 2024 Puppet State of DevOps Report found 81% of top-performing IT teams use continuous delivery practices, yet only 19% of teams reached elite performance levels in 2024.[18][29]

Key finding: In documented case studies, moving to monorepo with trunk-based development enabled teams shipping once per month to ship weekly, and teams resolving 1–2 tickets/day to resolve 6–7 per day.[3]

Industry Adoption at Hyperscale

OrganizationScaleTBD Practice
Google[18][29]35,000 developers†; 2B+ LOC; ~40,000 commits/daySingle monorepo trunk; no personal branches; branches reserved for releases
Meta[29]Tens of thousands of engineersContinuous integration with TBD; daily mobile releases via feature flags
Uber[29]1,000+ commits/day; ~3,000 microservicesAll production builds from main; organized by language monorepos
Pinterest[18][29]1,300+ repositories3-year migration into four monorepos paired with TBD
Netflix[3]Uses trunk-based development

† Figure from raw_18.md/raw_29.md includes QA automators; raw_1.md/raw_9.md cite 25,000 software engineers specifically. Both figures derive from different measurement periods and inclusion criteria.[1][18]

PR Cycle Time: Monorepo vs. Polyrepo

A typical PR in a monorepo takes 19 hours to merge, compared to 2 hours in a polyrepo.[5][29][30] Monorepos exhibit greater variability in PR cycle time, reflecting differences in tooling maturity and CODEOWNERS enforcement quality.[5]

Seven Coordination Mechanisms for Large Trunk-Based Teams

#MechanismHow it enables TBD at scale
1Feature flags[3][18][29]Incomplete features merged hidden behind flags; Meta ships daily mobile releases this way
2Code ownership checks[18][29]Programmatic review enforcement; a Cloud engineer cannot approve YouTube algorithm changes
3Specialized build systems[18][29]Bazel/Buck/Pants process dependency graphs and rebuild only affected code
4Sparse checkouts[3][18]Google: scripted checkout modification; Sapling: sparse profiles
5Stacked PRs + merge queues[18]Graphite-style stacks break large changes into smaller dependent branches with ordered queuing
6Release branches[29]Cut as snapshots for stabilization; bug fixes cherry-picked; new features excluded
7Communication cadence[18]Regular stand-ups ensure awareness of impactful trunk changes

Monorepo vs. Polyrepo Trade-off

Monorepos trade high tooling complexity for low coordination cost; polyrepos trade low tooling complexity for high coordination cost (dependency management, versioning).[11] The threshold at which monorepo advantages outweigh tooling investment typically occurs when cross-repo dependency friction exceeds the cost of investing in build system and merge infrastructure.[11]


Section 5: Build Systems: Bazel, Buck2, and Remote Execution

Bazel and Buck exist because traditional build tools (make, etc.) cannot handle large-scale, multi-language monorepos.[4][19][31] Google's internal Blaze (Bazel's predecessor) has operated since 2006 on the assumption of remote execution by default; Meta's Buck2 was rebuilt from scratch in Rust with the same philosophy.[4]

Key finding: Google runs millions of builds executing millions of test cases and producing petabytes of build outputs from billions of lines of source code every day—achieved via a shared action result cache where any build action already executed by any Google engineer is returned immediately without re-execution.[31]

Build System Genealogy and Specifications

SystemOriginOpen-sourcedLanguageRE ProtocolKey Design Choice
Blaze[31]Google, 2006No (internal)JavaCustom (Forge)Remote execution by default from inception
Bazel[31]Google (Blaze OSS port)March 2015JavaCustom RE APIMulti-phase execution model
Buck (v1)[4]Meta/FacebookMarch 2013JavaCustomLocal-first with optional RE
Buck2[4][19]Meta (from-scratch rewrite)2023RustBazel RE API (OSS)Remote execution first; single dependency graph
Pants[12]TwitterOpen sourcePythonRE supportedLarge-scale multi-language monorepos
Nx[12][29]Community / NrwlOpen sourceTypeScriptDistributed cacheTypeScript/JS; DAG for affected targets only

Remote Execution Capabilities

CapabilityBazel + REBuck2 + RE
Compile parallelism[31]Laptop compile runs on 96-core cloud machineSame — local execution is a special case of remote
Test parallelism[31]Up to 1024× parallel on compute clusterThousands of parallel actions with shared cache
Action caching[31]Result returned immediately if any user already built itSame shared cross-org cache
Test caching[4][31]Identical inputs → second run skipped entirelySame
Compatible RE backends[19]EngFlow, BuildBarn, BuildBuddy, customEngFlow, BuildBarn, BuildBuddy (Bazel RE API)

Buck2 Architectural Improvements over Buck1

Uber: Bazel + Gazelle Performance Results

Uber's Go monorepo is among the largest Go repositories using Bazel and Gazelle (Bazel's official Go/Protobuf rule generator).[33] By implementing a Changed Target Calculator (CTC) that feeds selective targets to Bazel remote execution across separate machines, Uber achieved:[33]

MetricBeforeAfterImprovement
10,000 target CI run[33]60 minutes10 minutes83% reduction
Average build time[33]45 minutes14 minutes69% reduction
CI resource utilization[33]23%78%3.4× efficiency improvement

Future Direction: Bonanza (Distributed Graph Evaluation)

An emerging approach lifts Bazel's in-memory build graph (Skyframe) into a remote cluster backed by a distributed persistent cache. All graph nodes are stored remotely, making every build incremental—eliminating the "cold build" penalty entirely.[4][31]


Section 6: Code Ownership Enforcement: OWNERS and CODEOWNERS

Google pioneered the OWNERS file model: every directory contains a file listing responsible reviewers. Changes to a subtree must be approved by an owner of that subtree before submission.[38] This model directly inspired GitHub's CODEOWNERS feature, GitLab's code owner system, and adoption by large open-source projects including Chromium and Kubernetes.[38]

Key finding: Udemy's monorepo audit found 1 in 8 files had no owning team specified in CODEOWNERS—meaning edits to those files had no required review by accountable owners, representing a direct security and compliance gap at scale.[12][25]

Google OWNERS Model vs. GitHub CODEOWNERS

DimensionGoogle OWNERSGitHub CODEOWNERS
Mechanism[38]OWNERS file per directory; Gerrit code-owners plugin enforcesSingle CODEOWNERS file with glob patterns; enforced by branch protection
Review types required[38]Two distinct approvals: (1) detailed code review for quality, (2) owner approval for appropriatenessOne required review from matching owner
Access model[38]Anyone can change any file; owner approval required to submitPR blocked from merge until all matching owners approve
Bot support[38]Bots can be OWNERS to automate process checksBots can be assigned as code owners
Build-level enforcement[38]Bazel/Blaze visibility rules enforce package dependency ownershipNot enforced at build level

Known Challenges at Scale

ChallengeData PointSource
Review load imbalanceTech leads receive 20–30 review requests daily when assigned as broad-area owners[21]
Cross-team changesChanges spanning multiple team boundaries require 3–4 separate owner approvals[21][25]
Notification noiseTeams with 10+ members trigger excessive interruptions from auto-assignment[21]
Ownership driftCODEOWNERS files grow stale as codebase evolves; enforcement becomes inconsistent[21][25]
CODEOWNERS + Merge Queue conflictKnown GitHub bug: CODEOWNERS bypass does NOT work when merge queue is enabled[5][30]
PR stallsWhen primary owners are unavailable, PRs block—backup reviewers or round-robin needed[21]
Coverage gaps1 in 8 files unowned at Udemy[12][25]

Real-World Ownership Programs at Scale

OrganizationScaleOwnership ApproachOutcome
Rippling[12][25]700+ engineers; Python monorepoProgrammatic "Service Catalog" — services independent from org chart; one team may own multiple code areasPer-team metric tracking (test runtime, flakiness), gamified leaderboards, data-driven roadmaps
Meta[25][38]50,000+ engineersPhabricator with ownership-based review routing; 2020 research paper on ownership management challenges at scale"Ownership at Large – Open Problems and Challenges in Ownership Management" (Ahlgren et al.)
Google[38]25,000+ engineersOWNERS files + Gerrit plugin + Bazel visibility + Rosie automated shard routingGlobal approvers can auto-approve all shards via pattern tooling

Automated Governance Mechanisms

Bots as Code Owners

When an OWNER is a bot, machine-readable ownership enables automated process checks: for example, if service B maintains a hard-coded authorized-client list, adding service A requires a review from B's owners—enforcing API access control via the ownership model itself.[38]

Strategic Ownership Principles

Effective ownership strategy goes beyond mapping directories to teams. Two key observations from practitioners:[21]

See also: Scope Overlap Detection

Section 7: Pre-Submit Validation and Change Impact Analysis

At Google, Meta, and Stripe scale, exhaustive test execution on every commit is computationally infeasible. A change in a widely used shared library can transitively affect hundreds of downstream applications—making selective test execution, dependency graph analysis, and progressive validation pipelines mandatory infrastructure.[8]

Key finding: When a single monorepo has no branches and 100,000+ developers, "the blast radius of a bad check-in is massive — there is a non-zero probability that your check-in will block thousands or tens of thousands of other engineers."[34] This constraint is why Google invests orders of magnitude more in pre-submit validation than Amazon does.

Google TAP: Scale and Performance Benchmarks

TAP MetricValue
Unique changes handled daily[14][26]>50,000
Individual test cases executed daily[14][26]>4 billion
Change submission rate[26]>1 per second
Average presubmit wait time[14][26]~11 minutes
Presubmit pass → full test pass likelihood[14]>95%

Google's Two-Phase Testing Model

PhaseTimingMechanismScope
Presubmit[14][26]Runs during code review loop, before submissionStatic build-dependency test selection + ML-driven selection + ML flakiness mitigationFast unit test subset; flaky tests excluded; global presubmit for widely-used libraries
Post-Submit[14][26]Asynchronous, after submissionTAP runs all potentially affected tests including large/slow tests; auto-bisects failing batchesFull affected test suite; automatic rollback when culprit identified with high confidence

A behavioral side effect: the visible difference between triggering 100 vs. 1,000 downstream tests incentivizes engineers to make smaller, more targeted changes, reducing blast radius organically.[26]

Google Build Cop Model

Each Google team designates a Build Cop responsible for maintaining the health of their test suite. When TAP's post-submit run detects failures, it automatically bisects failing batches—splitting them and rerunning each change in isolation to identify the culprit. When a culprit is identified with high confidence, TAP supports automatic rollback without requiring manual intervention.[26]

The cultural norm reinforcing this system: "Rolling a change back is often the fastest and safest route to fix a build." This rollback-first culture, combined with automated bisection and the designated Build Cop role, means failures are isolated and remediated quickly rather than compounding.[26]

Hermetic Testing Results at Google

Case StudyInterventionMeasured Outcome
Google Assistant[26]Transition to hermetic presubmit (version-hermetic, no network calls)14× runtime reduction; virtually zero flakiness
Google Takeout — broken servers[26]Hermetic presubmit (integrates 90+ Google products)Prevented 95% of broken servers from bad configuration
Google Takeout — deployment failures[26]Hermetic presubmitNightly deployment failures reduced 50%
Google Takeout — culprit detection[26]E2E tests from nightly to every 2 hours12× reduction in culprit set size
Google Takeout — debug burden[26]Refactored test suites35% reduction in Takeout team debugging involvement

Tricorder: Google's Static Analysis Platform

Tricorder MetricValue
Code review changes processed daily[22]>50,000
Analysis rate[22]Multiple analyses per second
Total analyzers (as of Jan 2018)[22]146 (125 contributed from outside Tricorder team)
Languages supported[22]30+
Effective false-positive rate[22]<5%
Automated fixes applied daily[22]~3,000 by authors

Tricorder architecture: microservices model that sends analysis requests to dedicated servers alongside change metadata; servers access source via a FUSE-based filesystem; results surface directly in the Critique diff viewer.[22] The platform enforces two critical impact checks: a warning when a changelist will transitively affect a large percentage of the codebase, and a warning when a changelist needs merging with HEAD.[22]

High-confidence analyses are promoted into compilers as build errors—Error Prone 'ERROR' checks are enabled in Google's Java compiler. Google treats checks as either build-breaking errors or suppressions; no compiler warnings exist.[22]

Quality Standards for New Tricorder Checks

Tricorder enforces four quality standards before accepting new analyzer checks into the platform:[22]

StandardRequirement
Understandable outputsResults must be accessible and meaningful to any engineer, not just the check author
Actionable fixesEach finding must include implementation guidance so the author knows exactly what to change
False positive rateEffective false positive rate must stay below 10% to avoid alert fatigue
Demonstrated impactCheck must show significant positive impact on code quality before acceptance into the platform

Stripe's Selective Test Execution

Stripe's 50M-line Ruby monorepo (the largest known Ruby codebase) ships approximately 1,145 pull requests per day via GitHub Enterprise Server.[23][36] Their selective test execution system—using Bazel and custom CI infrastructure—runs only ~5% of tests on average per PR, enabling continued scaling of both personnel and codebase without proportional CI cost growth.[10][23]

Google vs. Amazon CI Philosophy

These two hyperscalers represent opposite ends of the CI investment spectrum:[34]

Three Core Mechanisms for Monorepo Impact Analysis

  1. Affected-test detection: Only run tests for packages whose dependency graph intersects the change—if A changes and B, C depend on A, run A+B+C tests; D, E, F are excluded.[8]
  2. Test sharding: Distribute the affected test set across parallel workers.[8]
  3. Build caching: Skip all unchanged packages entirely using action result cache.[8]

A related problem Google terms mid-air collisions: two changes modifying completely separate files can still cause a test failure when their effects combine at runtime. Google addresses this via aggressive dependency analysis and selective test execution rather than serializing all submissions—the dependency graph catches indirect interactions that file-level diff analysis misses.[26]

See also: Scope Overlap Detection

Section 8: Merge Queues and Submit Queues

Without a merge queue, large monorepo teams face the "merge race": multiple developers complete work simultaneously, all attempt to merge, and broken builds cascade. In monorepos where everything is connected and build times are long, this produces unbounded retry cycles.[6][20][37] Teams implementing automated merge queues report an average 24% reduction in PR cycle times; well-tuned queues are estimated to save ~$750K annually for 20-developer teams (according to Aviator, a merge queue vendor).[20]

Key finding: Before Uber's SubmitQueue, mainline was green only 52% of the time and 10% of commits on worst days required reversion—despite passing CI—due to rebase conflicts. After SubmitQueue, mainlines remained green 99%+ of the time.[13]

Uber SubmitQueue: Architecture and Scale

DimensionValue
Engineers served[13]4,500+ across global development centers
Monorepos covered[13]6 major monorepos, 7 programming languages
Codebase size[13]Hundreds of millions of lines of code
ML model accuracy[13]97% (predicts change success, build time, scheduling)
Large diff bypass improvement[13]74% improvement in wait time to land code
Mainline green rate before[13]52%
Mainline green rate after[13]99%+

SubmitQueue Core Components

Large Diff Problem and Solution

When a change affects a large repository portion, it conflicts with most subsequent queued items, creating a sequential bottleneck. Uber proved it is safe to land large changes out of queue order when all branches in the speculation tree produce the same outcome, yielding a 74% reduction in wait time to land large-diff changes.[13]

GitHub Merge Queue: Timeline and Scale Results

MilestoneDateDetail
Trains introduced[6]2016Special PRs grouping multiple changes for simultaneous testing
Trains recognized as bottleneck[6]2020Limiting velocity; merge queue project initiated
Internal testing[6]Mid-2021Small internal repos begin testing
Full production migration[6]2023Large monorepo + all production service repos; GA released
GitHub Merge Queue MetricValue
Engineers using[6]500+
PRs per month[6]2,500
Deploy time reduction[6]33% average wait time reduction
Simultaneous deployment capacity[6]15 → 30+ changes at once
PRs used in development/testing[7]30,000+
CI runs during development[7]4.5 million

Design Principles

GitHub's merge queue was built around three explicit design principles:[7]

GitHub Merge Queue Technical Design

The queue creates temporary test branches combining current main with queued PR changes, runs required checks on temp branches before merging anything to main.[37] With many concurrent PRs, branch combinations grow combinatorially (3 PRs → up to 6 branch combinations). Optimization requires a fine-grained model of project-level dependencies—GitHub developed a language-agnostic project-impact-graph.yaml specification to encode which projects affect which others, enabling the merge queue to skip redundant cross-project test combinations.[37]

GitLab Merge Trains: Outschool Case Study

Outschool grew from 20 to 50 engineers in 2021. With 30-minute build+deploy times, queue depth grew to 6–8 developers on average (3 hours wait), occasionally 12—with engineers waking at 6am to reserve slots.[37] After enabling GitLab Merge Trains (up to 20 concurrent pipelines): 400 merges in a 10-hour window, eliminating manual Slack queue coordination.[37]

GitLab CI/CD optimization: parent-child pipelines with rules:changes run pipelines only when specified paths change, preventing full pipeline triggers for unrelated commits.[20]

Merge Queue Platform Comparison

PlatformModelScale EvidenceKey Differentiator
Uber SubmitQueue[13]Speculation + ML scheduling4,500 engineers; 6 monorepos97% ML predictor; out-of-order landing for large diffs
GitHub Merge Queue[6][37]Temp branch batching30,000+ PRs; 4.5M CI runs in testingproject-impact-graph.yaml for dependency-aware batching
GitLab Merge Trains[37]Sequential simulationUp to 20 concurrent pipelines400 merges/10hr window in real production
Graphite[6]Stacked PRs + batchingDashboard + CI parallelism + stacked diff support
Mergify[6]Rule-based triggersFlexible rule engine for merge conditions

Merge queues evolved from side-project bots—Bors and Homu, used in the Rust project to normalize the serial-merge pattern—into standard platform features now offered by GitHub, GitLab, and dedicated vendors.[6]

See also: Git Worktree Mechanics

Section 9: Large-Scale Change Automation and Rollout Control

At monorepo scale, cross-cutting refactors—renaming a widely-used symbol, upgrading a shared library, changing an RPC interface—can affect thousands of files and hundreds of services simultaneously. Without automated orchestration, such changes are either avoided (accumulating technical debt) or create catastrophic incidents.[32][33]

Key finding: Google's Rosie infrastructure enables "technical decisions that used to be permanent — names of widely used symbols, locations of popular classes — [to be] now reversible." Thousands of changes are created, tested, reviewed, and submitted across all of Google's codebase per day via Rosie.[32]

Google Rosie: LSC Workflow

Rosie takes a large change (human-authored or automated) and shards it by project and OWNERS file boundaries into independently submittable changes.[32]

StepMechanismDetail
1. Generate[32]Human or automated toolingsed, clang tools, custom scripts, or Rosie-native transforms
2. Shard[32][38]Rosie splits by project/OWNERS boundariesOwners detection service weights each owner by expected review availability
3. Test[32]Each shard through independent test pipelineRuns at lower priority; caps outstanding shards; communicates infrastructure load
4. Review[32]Owner review or pattern-based auto-approvalGlobal approvers examine only anomalous cases (conflicts, tooling malfunctions)
5. Submit[32]Rosie submits each approved shard atomicallyUnresponsive owners → additional reviewers added automatically

Pattern-Based Automated Review

For large consistent refactors, global reviewers configure pattern-based tooling to automatically approve conforming changes. Only anomalous cases (merge conflicts, tooling failures, unexpected patterns) require human review.[32] Global approvers can approve all shards in a LSC instead of routing to individual directory owners.[38]

API Migration Workflow

Rosie-enabled API migrations follow a three-step pattern:[32]

  1. Add support for new API with conditional compilation alongside old API
  2. Gradually migrate all callsites from old API to new via Rosie shards
  3. Delete old API and all inactive calls once all callsites migrated

Uber: Cross-Cutting Deployment Orchestration

Uber's Go monorepo sees 1,000+ commits per day sourcing ~3,000 microservices. Analysis of 500,000 commits revealed:[33]

Without intervention, CD pipelines push all these changes to production immediately—a single buggy high-impact commit can break thousands of services in parallel.[33]

Deployment Orchestration Architecture

ComponentFunction
State machine[33]Lightweight, asynchronous; tracks deployment state per commit across all impacted services
Periodic jobs[33]Track deployment outcomes across all affected services; progress gates on success/failure thresholds per stage
Service tiering[33]Tier 0 (most critical) to Tier 5; less critical services deploy first; success at each tier enables progression
Failure propagation prevention[33]Cross-service failure signals halt progression before production blast radius expands

Result: Deployment safety incidents reduced from 12/month to 1.2/month—a 90% reduction.[33]

Uber DevPod: Cloud-Based Development

Uber's DevPod allows developers to build in the cloud using faster machines, with all monorepo tooling in a secure, controlled environment. DevPods run as containers on a Kubernetes cluster with necessary tooling and compute resources.[33]


Section 10: Developer Environment Tooling

At monorepo scale, individual developer machines cannot practically store or operate on the full repository. Google (CitC), Meta (EdenFS), and Microsoft (Scalar) each solve this with virtual filesystem or sparse checkout approaches. Stripe and Uber represent the cloud devbox model—ephemeral, centrally-provisioned development machines that offload storage and compute to data centers.[10][33][36]

Stripe Developer Environment Evolution

EraModelCoordination Overhead
Early[36]Shared EC2 instances coordinated via Slack"Is anyone else using the API server?" — manual, blocking
Middle[36]Local laptop developmentEnvironment drift; expensive hardware requirements
Current[10][36]Centrally-provisioned ephemeral cloud devboxes"Pull master and re-create your devbox" resolves most problems

Stripe Monorepo Stats

Stripe Tooling Details

Stripe Minions: Autonomous AI Parallel Development

Stripe built "Minions"—fully autonomous AI coding agents operating inside their 50M-line Ruby monorepo. Four specialized agents run concurrently to complete development tasks end-to-end:[10][23]

All four agents run in parallel, producing production-ready code end-to-end without sequential handoffs. As of early 2025, this is one of the few publicly documented production deployments of autonomous parallel AI coding agents in a large-scale monorepo environment.[10][23]

"When a single engineer can run ten tasks at once, the definition of productivity changes permanently."[23]

Developer Environment Models at Scale: Comparison

OrganizationModelTechnologyKey Property
Google[1][9]Cloud-backed virtual filesystem (CitC)FUSE + Piper/Spanner<10 files local on average; full repo accessible instantly
Meta[17]Virtual filesystem (EdenFS)FUSE/NFS/ProjectedFS + MononokeLazy fetch; multiple concurrent checkouts per daemon
Microsoft[24]Sparse checkout (Scalar)Git v2.38 built-inNo virtualization layer; partial clone + explicit sparse file list
Stripe[10][36]Ephemeral cloud devboxesCentrally-provisioned VMs; pay CLICentralized maintenance; "recreate devbox" as standard fix
Uber[33]Cloud development containers (DevPod)Kubernetes-hosted containersFaster machines; full tooling access; secure controlled environment

Section 11: Cross-Organization Comparison

Monorepo Scale Benchmarks

OrganizationRepo SizeEngineersDaily CommitsCI Scale
Google[1][9][14]86 TB; 2B LOC; 9M files25,000 (95% of Google)~40,000 (16K manual + 24K bots)>4B test cases/day; >50K changes/day via TAP
Meta[2][16]Tens of millions of filesTens of thousandsDaily mobile releasesMononoke + EdenFS + Sapling
Microsoft[11][24]~300 GB; 3.5M files (Windows)Thousands (Windows division)Scalar + partial clone
Stripe[36][23]50M LOC (Ruby)~1,145 PRs merged/dayBazel + GHE; ~5% tests per PR
Uber[13][33]Hundreds of millions LOC; 6 monorepos4,500+1,000+ (Go monorepo alone)SubmitQueue (97% ML); ~3,000 microservices

Toolchain Stack by Organization

LayerGoogleMetaMicrosoftStripeUber
VCS[1][2][24][36][13]Piper (custom, Spanner-backed)Sapling + MononokeGit + ScalarGit + GHEGit
Virtual FS[1][17][24]CitC (FUSE)EdenFS (FUSE/NFS/ProjectedFS)VFS for Git (deprecated) → ScalarNone (ephemeral devboxes)None (DevPod containers)
Build[31][19][10][33]Blaze + Forge (RE)Buck2 (Rust, RE-first)BazelBazel + Gazelle
Code review[1][25][36]Critique + OWNERSPhabricator + ownership routingGitHub PRs + CODEOWNERSGHE + CODEOWNERSGitHub + SubmitQueue
Static analysis[22]Tricorder (146 analyzers)Custom + Phabricator ownership routing; Ahlgren et al. 2020 (ownership at scale)[25][38]
Merge control[6][13]TAP (pre/post-submit gates)Custom CI + feature flagsGitHub Merge QueueGHE CISubmitQueue (speculation + ML)
LSC automation[32]Rosie (shard by OWNERS)Deployment orchestration (tiered)

Microsoft and Stripe static analysis tooling not documented in corpus sources; cells left blank.

Key Performance Outcomes

InterventionOrganizationBeforeAfterImprovement
SubmitQueue adoption[13]UberMainline green 52% of time; 10% commits reverted on worst days99%+ green mainlineNear-elimination of merge failures
Merge queue (GitHub)[6]GitHub15 simultaneous deployments30+ simultaneous33% reduction in average deploy wait time
Bazel + CTC (CI)[33]Uber60 min for 10K targets; 45 min avg build10 min; 14 min avg83% / 69% reduction
Hermetic presubmit[26]Google (Assistant)Slow, flaky presubmit14× faster; near-zero flakiness14× runtime reduction
Tiered CD orchestration[33]Uber12 deployment safety incidents/month1.2 incidents/month90% reduction
VFS for Git[35]MicrosoftClone: 12+ hr; checkout: 2–3 hr; status: 10 minClone: minutes; checkout: 30 sec; status: 4–5 secOrders of magnitude on all dimensions
Selective test execution[10]StripeAll tests per PR~5% tests per PR95% test reduction while maintaining quality
Buck2 vs. Buck1[19]MetaBuck1 build times2× faster50% build time reduction

Sources

  1. Piper (source control system) - Wikipedia (retrieved 2026-05-03)
  2. Sapling: Source control that's user-friendly and scalable - Engineering at Meta (retrieved 2026-05-03)
  3. Monorepos - Trunk Based Development (retrieved 2026-05-03)
  4. Why Buck2 | Buck2 (retrieved 2026-05-03)
  5. Code Reviews at Scale: CODEOWNERS & GitHub Actions Guide - Aviator (retrieved 2026-05-03)
  6. How GitHub uses merge queue to ship hundreds of changes every day - The GitHub Blog (retrieved 2026-05-03)
  7. How GitHub uses merge queue to ship hundreds of changes every day - The GitHub Blog (retrieved 2026-05-03)
  8. The Rise of Test Impact Analysis - Martin Fowler (retrieved 2026-05-03)
  9. Google Monorepo Paper - Why Google Stores Billions of Lines of Code in a Single Repository (retrieved 2026-05-03)
  10. Selective Test Execution at Stripe: Fast CI for a 50M-line Ruby monorepo - Stripe Dev Blog (retrieved 2026-05-03)
  11. How Microsoft's Git fork scales for massive monorepos - InfoWorld (retrieved 2026-05-03)
  12. Code Ownership Patterns in Polyrepo vs Monorepo Architectures - CrashBytes (retrieved 2026-05-03)
  13. Bypassing Large Diffs in SubmitQueue - Uber Engineering Blog (retrieved 2026-05-03)
  14. Software Engineering at Google - Continuous Integration (Chapter 23) (retrieved 2026-05-03)
  15. Piper (source control system) - Wikipedia (retrieved 2026-05-03)
  16. Sapling: Source control that's user-friendly and scalable - Engineering at Meta (retrieved 2026-05-03)
  17. EdenFS Architecture Overview - facebook/sapling GitHub (retrieved 2026-05-03)
  18. Monorepos - Trunk Based Development (retrieved 2026-05-03)
  19. Build faster with Buck2: Our open source build system - Engineering at Meta (retrieved 2026-05-03)
  20. Merge Queues for Large Monorepos - Aviator (retrieved 2026-05-03)
  21. Code Reviews at Scale: CODEOWNERS & GitHub Actions Guide - Aviator (retrieved 2026-05-03)
  22. Tricorder: Google's Static Analysis Platform - Software Engineering at Google (Abseil) (retrieved 2026-05-03)
  23. Selective Test Execution at Stripe: Fast CI for a 50M-line Ruby monorepo - Stripe Dev Blog (retrieved 2026-05-03)
  24. The Story of Scalar - The GitHub Blog (retrieved 2026-05-03)
  25. Code Ownership Patterns in Polyrepo vs Monorepo Architectures - CrashBytes (retrieved 2026-05-03)
  26. Continuous Integration at Google - Software Engineering at Google (Abseil) (retrieved 2026-05-03)
  27. Piper (source control system) - Wikipedia (retrieved 2026-05-03)
  28. Sapling: Source control that's user-friendly and scalable - Engineering at Meta (retrieved 2026-05-03)
  29. Monorepos - Trunk Based Development (retrieved 2026-05-03)
  30. Code Reviews at Scale: CODEOWNERS & GitHub Actions Guide (retrieved 2026-05-03)
  31. Distributed Builds - Bazel and Buck2 Remote Execution (retrieved 2026-05-03)
  32. Large-Scale Changes - Google Software Engineering Book (Abseil) (retrieved 2026-05-03)
  33. Controlling the Rollout of Large-Scale Monorepo Changes - Uber Engineering Blog (retrieved 2026-05-03)
  34. Pre-Submit Validation and Change Impact Analysis in Monorepos - Google vs Amazon Approaches (retrieved 2026-05-03)
  35. Virtual File System for Git (VFSForGit/GVFS) - Wikipedia (retrieved 2026-05-03)
  36. Stripe's Monorepo Developer Environment - Made of Bugs (retrieved 2026-05-03)
  37. How GitHub uses merge queue to ship hundreds of changes every day - GitHub Blog (retrieved 2026-05-03)
  38. Engineering with Code Ownership - Dan Palmer (retrieved 2026-05-03)

Home