Upfront Conflict Zone Detection

Executive Summary

Critical finding: Production multi-agent LLM systems show failure rates between 41% and 86.7%, and nearly 79% of those failures originate from specification and coordination issues — not model capability limitations.^[18]^[28] Pre-flight conflict detection directly addresses the dominant failure mode, making it the highest-leverage investment in any parallel coding agent system.

Microsoft's ConE system, deployed in March 2020 across 234 repositories, provides the most rigorous production evidence for pre-flight conflict detection at scale. Over 26,000 pull requests, ConE generated 775 recommendations about conflicting changes and was rated useful in over 70% of cases. More tellingly, over 90% of the 48 interviewed developers intended to keep using it daily — a retention rate that decisively separates it from the technically comparable blast-radius.dev project, which was abandoned due to insufficient adoption despite sound engineering.^[10] ConE achieves this by avoiding deep semantic analysis entirely: its Extent of Overlap (EOO) metric is a lightweight file-level scalar conflict score, deliberately fast and scalable. The deployment contrast with blast-radius.dev (voluntary external tool, no reported retention, now discontinued) establishes that deployment model — mandated internal integration versus voluntary adoption — predicts success more reliably than technical sophistication.

Static semantic analysis delivers a recall of 0.60 against dynamic analysis's 0.14 on the same benchmark — a fourfold improvement — while running at a median of 17.8 seconds, two to three orders of magnitude faster than prior information flow approaches.^[1] The technique, evaluated across 99 experimental units from 54 merge scenarios in 39 projects, uses four algorithms — interprocedural data flow, confluence, override assignment, and program dependence graph analysis — applied to merged code annotated with per-contributor metadata. The F1 of 0.50 comes with a precision of only 0.43, driven primarily by 26 false positives from refactoring changes (14 cases) and 13 false negatives from deleted lines (7 cases). Pre-flight application is direct: annotate planned modification zones with agent metadata and run the four analyses against the current codebase before any agent writes code. If data flow or confluence paths connect two planned zones, conflict is predicted upfront.^[22]

For call graph-based impact analysis, the counterintuitive finding from a study of 10 open-source Java projects with approximately 17,000 mutants is that the most basic algorithm — Class Hierarchy Analysis (CHA) — gives the best precision-recall tradeoff for impact prediction.^[2] More sophisticated pointer analysis (SPARK) improves completeness but raises false positive rates in conflict prediction, making it counterproductive for pre-flight gates. Practical implication: transitive CHA closure from an agent's target files defines a reliable blast radius estimate. The intersection of two agents' transitive closures defines the predicted conflict zone. Teams should start with CHA rather than investing in expensive whole-program pointer analysis, reserving SPARK for post-flag deep analysis of confirmed high-risk pairs only.^[23]

Task description similarity analysis using SBERT (Sentence-BERT) embeddings achieves F1 scores of 87.1% to 92.3% for detecting intent overlap across three of four benchmark datasets — outperforming GPT-4o, Llama-3, Claude Sonnet 3.5, and Gemini-1.5 in domain-specific settings.^[4]^[32] The two-phase S3CDA algorithm computes cosine similarity on SBERT vectors, then validates high-similarity pairs through entity extraction (actors, actions, objects, resources) with an overlap ratio threshold. For maximum recall, the unsupervised variant UnSupCDA achieves 100% recall across most datasets at the cost of lower precision. This semantic layer is orthogonal to dependency graph analysis: high similarity without file overlap signals hidden future conflict; file overlap without semantic similarity may be incidental co-location. Running SBERT similarity in milliseconds against all pending task pairs provides a Layer 0 gate that filters candidates before triggering any structural analysis.

Code ownership maps expose a hidden divergence problem with direct implications for agent task assignment: across studied systems, only 0–40% of developers are commonly identified by both commit-based and line-based ownership methods (correlation: 0.24–0.65), and 79% of individual code owners were NOT among the top 100 most frequent committers.^[3]^[29] This means using commit frequency as an ownership proxy misses the majority of actual owners. Commit-based metrics — proportion of commits by developer relative to total file commits — appear as the highest-ranked predictor in 97% of correctly predicted defective files, making them the superior method for conflict risk (rather than authorship). Files with no clear committed ownership, or with high minor-contributor counts, carry disproportionately elevated conflict risk. The CODEOWNERS pre-flight check runs in O(1) per agent pair: parse CODEOWNERS, map each agent's target files to owners, flag multi-agent ownership collisions before dispatch.^[15]

Machine learning on git history delivers the strongest predictive accuracy of any single technique: across 744 open-source GitHub repositories in 7 programming languages — the largest merge conflict prediction study published to date — a Random Forest model combining social and technical features achieves 0.92 accuracy and 1.00 recall.^[21] The social features reveal non-obvious patterns: top contributors at the project level cause more conflicts, while occasional contributors at the merge-scenario level also cause more conflicts. The specific combination of top project contributor and occasional merge contributor simultaneously yields a 32.31% conflict probability. Cross-layer changes (e.g., spanning MVC layers) are significantly more conflict-prone than same-layer changes, and long-lived branches cause disproportionately more conflicts. Applied to agents: these features — file ownership, task scope across architectural layers, agent task frequency patterns — are all computable pre-dispatch from git history with no code execution required.

RIPPLE (ICSE 2026) bridges natural language task intent and concrete file impact prediction: in 86% of commits, at least one co-changing location is structurally or semantically dependent on the seed edit (Hit@K metric), and RIPPLE's F1 improves over existing change impact analysis baselines by 39.7% to 380.8%.^[27] Its two-phase design — a recall-focused expansion combining evolutionary coupling (commit co-change history) and dependence coupling, followed by a precision-focused LLM planner that reasons over dependence clusters — converts a natural language task description directly into an expanded file/function impact set. Evolutionary coupling independently identifies impacted locations in 21% of commits that structural dependence analysis alone misses entirely. For two parallel agents, computing the intersection of their RIPPLE-expanded impact sets provides a conflict zone prediction before any code is written.

Practitioners building parallel coding agent systems should implement a five-layer pre-flight pipeline where each layer filters candidates before the next: Layer 0 runs file-level EOO and CODEOWNERS checks in milliseconds — direct conflicts routed sequential immediately; Layer 1 runs SBERT cosine similarity against all non-blocked task pairs in seconds — flags intent overlap; Layer 2 traverses import/dependency graphs and CHA call graphs for structurally flagged pairs — computes blast radius intersections; Layer 3 applies static semantic analysis (data flow / confluence algorithms) or RIPPLE for pairs with structural overlap — confirms and localizes conflict; Layer 4 runs a Random Forest model trained on git history continuously scoring all pairs for probabilistic risk. The critical architectural constraint is that the pipeline must complete before agents are dispatched — not during or after. ConE's production success with a deliberately lightweight Layer 0 alone (70%+ usefulness, 90%+ retention) suggests that even partial coverage dramatically improves outcomes, making incremental deployment viable: start with file-level overlap and ownership checks, measure false negative rates, then add structural layers based on observed miss patterns in real workloads.

Section 1: Conflict Taxonomy and the Pre-Flight Imperative

Two structurally distinct conflict classes appear consistently across the literature, and any pre-flight detection system must handle both. Direct conflicts arise from concurrent changes to the same file, function, or line — detectable via file-level overlap in O(n) time. Indirect conflicts arise when changes to different code areas interact through logical or semantic dependencies: one agent changes an API that another agent's code depends on, even when there is zero textual overlap between their changes.^[8]^[19]

The Two Conflict Classes

The Pre-Flight Imperative: Why Detection Must Precede Work

Class	Definition	Detection Mechanism Required	Example
Direct	Same artifact modified concurrently by two agents	File/symbol overlap check (O(n))	Both agents edit `auth.py` simultaneously
Indirect (API-induced)	"Changes in one artifact affecting concurrent changes in another artifact"^[8]	Dependency graph traversal; call graph analysis	Agent A changes `authenticate()` signature; Agent B calls it from an unrelated file
Semantic (silent)	Merged code compiles but exhibits unintended runtime behavior due to interacting changes with no textual overlap	Data flow / program dependence graph analysis	Agent A's conditional check interferes with Agent B's duplicate removal logic after merge

Simple file overlap detection catches only direct conflicts. Call graph and dependency graph traversal are required for indirect conflict detection. Both layers are necessary for a complete pre-flight check.^[8] In multi-agent coding practice, the pattern that mitigates the most conflicts is a mandatory plan approval before implementation workflow: agents write plans specifying files they intend to modify, a lead reviews for overlap, and then approves or rejects before any code is written — catching collision at the intent layer rather than the diff layer.^[30]

Push vs. Pull Architecture for Workspace Awareness

Traditional version control is pull-based: agents learn of others' changes only when they perform their own VCS operations. Palantír inverts this to push-based: continuously sharing workspace events across all agents, yielding "a more complete, accurate, and up-to-date picture of parallel activity."^[8]^[19] The incremental query only requests events relevant to artifacts present in the local workspace, avoiding information overload.

For indirect conflict awareness specifically, Palantír transmits API differences of ongoing changes across workspaces. Each workspace uses a local cache of dependencies to calculate the impact of remote API changes and determines if local changes create new indirect conflicts — moving detection from merge-time to work-time.^[8] Mapped to AI agents: each agent is a "workspace," and the coordinator monitors API changes across agent worktrees.

Section 2: Static Analysis for Semantic Conflict Detection

Standard textual merge tools fail on a specific class of integration failures: "textual merge tools aren't able to detect incompatible changes that occur in areas of the code separated by at least a single line."^[1]^[12] Merged code may compile successfully but exhibit unintended runtime behavior due to unintended interference between concurrent changes — what the literature terms dynamic semantic conflicts.

Four Core Static Analysis Algorithms

A technique evaluated on 99 experimental units from 54 merge scenarios across 39 projects analyzes merged code annotated with developer-specific metadata using four lightweight static analyses:^[1]^[12]^[22]

Evaluation Results

Algorithm	Mechanism	Conflict Detected
Interprocedural Data Flow (DF)	Sparse Value Flow Analysis (SVFA) to detect interprocedural data flow paths between contributors' code	Def-use relationships where one agent's state modification affects another's state usage across method boundaries
Interprocedural Confluence (CF)	Identifies situations where separate changes flow to a common statement	Two agents modify different state elements that converge at a common statement, affecting behavior despite no direct data flow between their changes
Interprocedural Override Assignment (OA)	Tracks state update sequences across contributors	One agent's state updates overridden by another's — prevents behavior preservation during integration
Program Dependence Graph (PDG)	Analyzes control and data dependencies between instructions	One agent's changes influence execution of another's modifications through control flow relationships

Metric	Static Analysis (this technique)	Dynamic Analysis (baseline)
Precision	0.43	Higher (but lower recall)
Recall	0.60	0.14
F1 Score	0.50	<0.30 (estimated)
Median Execution Time	17.8 seconds	Hours (information flow analysis)

The technique significantly outperforms dynamic analysis in recall (0.60 vs. 0.14) while running 2–3 orders of magnitude faster than prior information flow approaches.^[1]

False Positive and False Negative Breakdown

Pre-Flight Application

Error Type	Count	Primary Cause	Secondary Causes
False Positives	26 cases	Refactoring changes (14 cases) — extract method refactorings create unnecessary annotations	Harmless code insertions
False Negatives	13 cases	Deleted lines (7 cases) — invisible in merged version, undetectable by analysis	Interface implementation limits; exception handling; recursive method limits; Java reflection; native methods

The data flow and confluence analyses can be applied to target code regions before agents start work by annotating planned modification zones with agent metadata, then running the four analyses against the current codebase. If DF or CF paths connect the two planned modification zones, conflict is predicted before any agent writes a single line.^[22]

Section 3: Call Graph Analysis and Program Slicing

Call Graph-Based Impact Analysis

A large-scale empirical study (Musco, Monperrus, Preux, Software Quality Journal 2017) evaluated four call graph construction algorithms across 10 open-source Java projects with approximately 17,000 mutants.^[2]^[13]^[23]

Four Call Graph Algorithm Comparison

Algorithm	Approach	Precision	Recall	Speed	Best For
CHA (Class Hierarchy Analysis)	Considers all potential call targets in the class hierarchy	Low	High	Fastest	Pre-flight blast radius — wide net, fast execution
RTA (Rapid Type Analysis)	Improves CHA by tracking instantiated types	Medium	High	Fast	Refined impact sets where instantiation is tracked
VTA (Variable Type Analysis)	Considers types of variables at call sites	Medium-high	Medium-high	Medium	Moderate precision requirements
SPARK	Pointer analysis — most complete	Highest	Highest	Slowest	Post-flag deep analysis of suspected high-risk overlap

Practical implication for agent conflict detection: simple CHA-level call graph traversal from target files outward provides a reliable blast radius estimate. The transitive closure of the call graph defines the set of potentially impacted files/functions. The intersection of two agents' transitive closures defines the predicted conflict zone. Teams should start with CHA rather than investing in expensive whole-program pointer analysis.^[23]

Static vs. Dynamic Call Graphs

Type	Approach	Precision	Recall	Availability
Static	Over-approximates potential callers/callees (especially virtual dispatch)	Lower	Higher	At dispatch time (no execution required)
Dynamic	Only captures actually-observed call paths during execution	Higher	Lower	Requires prior execution traces

For pre-flight conflict detection, static call graphs are the only option — dynamic graphs require execution that hasn't happened yet. Static over-approximation is acceptable because false-positive conflicts (flagging safe pairs as risky) are preferable to false-negative conflicts (missing real collisions).^[13]

Program Slicing for Blast Radius Prediction

Program slicing computes the set of statements that may affect values at a point of interest.^[5] Two directions are relevant:

NS-Slicer: ML-Based Predictive Slicing

Slice Type	Direction	Use Case	Pre-Flight Application
Backward slice	Find all statements that could affect a variable's value at a given point	Causation analysis	Identify what an agent's change depends on
Forward slice	Find all statements affected by a variable's current value	Ripple effect prediction	Primary tool: map which files will be affected by a planned modification

NS-Slicer uses pre-trained language models (GraphCodeBERT, CodeBERT) for static program slicing, achieving F1-score of 97.41% for backward slices and 95.82% for forward slices on partial code — useful specifically for in-progress agent tasks where full context is unavailable.^[5]

Multi-Granularity Impact Analysis

Granularity	Speed	Precision	Recall	Recommended Use
File level	Fastest	Low (high false positives)	Highest	Initial pre-flight gate — cast wide net
Function level	Medium	Medium	High	Secondary check for flagged file pairs
Line level	Slowest	Highest	Medium	Deep analysis for confirmed conflict zones

The Affected Slice Graph (ASG) metric — Affected Component Coupling (ACC) — directly ranks conflict risk: higher ACC values correspond to higher fault-proneness in the affected component.^[5]

Section 4: Import and Package Dependency Graph Construction

Python Import Graph Construction: Hudson River Trading Case Study

Hudson River Trading (HRT) built a complete import dependency graph for a Python codebase of millions of lines to address "code tangling" — overlapping dependency cycles causing more than 30-second import overhead.^[11]

Module Boundary Enforcement Tools in Monorepos

Pipeline Step	Tool	Output
Parse imports	Python `ast` module	All `import X` and `from X import Y` declarations
Build directed graph	NetworkX	Nodes = modules; edges = import relationships (importer → imported)
Parallelize parsing	`concurrent.futures`	Thousands of modules processed simultaneously
Transitive analysis	NetworkX graph algorithms	Transitive dependency closure, critical edge identification
Automated refactoring	`libcst`	CST-based import restructuring

In monorepos without enforced boundaries, any library can import from any other: "Changes in one package can ripple across dozens or even hundreds of interconnected modules."^[14]

CodeRAG: Multi-Language Dependency Graph via Tree-Sitter + Neo4j

Tool	Mechanism	Agent Pre-Flight Relevance
Nx	Tag-based module boundary enforcement via `@nx/enforce-module-boundaries` ESLint rule; analyzes import statements across the monorepo	Boundary violations by an agent's planned imports are detectable before code is written
Dependency Cruiser	Framework-agnostic dependency rule validation; identifies circular dependencies, orphans, shared code with single consumers	Run against agent's planned dependency additions to detect rule violations upfront
Sheriff	TypeScript module boundary enforcement at folder level without project.json tags	Lightweight enforcement for TypeScript monorepos

CodeRAG builds dependency graphs using Tree-Sitter for LLM-based code querying, storing relationships in Neo4j for graph traversal queries across the full dependency structure.^[9]^[26]

Pipeline Step	Action
1. Repository traverse	Clone repo; traverse all files; identify language via extension mapping
2. AST extraction	Extract structural nodes via Tree-Sitter DFS per language: type, position, text, function/class calls, inheritance, imports
3. Intra-file edges	Call name → function ID resolution within a file
4. Inter-file edges	Import metadata → source file → exported module (cross-file dependency)
5. Vector embedding	Google Gemini embeddings per node; stored alongside graph structure in Neo4j

The combination of vector similarity (semantic overlap) and graph traversal (structural overlap) provides two complementary conflict signals.^[9] Language support covers JavaScript, TypeScript, JSX, TSX, and Python; extensible to 100+ languages via the Tree-Sitter grammar ecosystem.^[26]

Polyglot Dependency Graph Mining: Algorithmic Complexity Comparison

An ecosystem-agnostic framework for detecting dependency conflicts in heterogeneous development environments provides complexity benchmarks for choosing the right algorithm at different stages of a pre-flight pipeline.^[20]

Technique	What It Detects	Complexity	Pre-Flight Stage
Change Overlapping	Same graph node touched by two changes	O(n)	Layer 1 — immediate gate
Constraint Violation	Incompatible version requirements	O(n²)	Layer 2 — dependency version check
Pattern Matching	Known anti-patterns (diamond deps, circular)	O(n log n)	Layer 2 — structural anti-pattern scan
CDA / Critical Pairs	All potential conflicts in minimal context	Exponential worst case	Layer 3 — only for flagged pairs
Graph Embedding + ML	Probabilistic conflict risk from graph structure	Training cost amortized; inference fast	Layer 2 — probabilistic scoring
GNN + LLM	Semantic + structural conflict detection	High, but parallelizable	Layer 3 — deep analysis for high-risk pairs

Graph embedding approaches use Node2Vec and GraphSAGE models to encode structural and contextual features into vector spaces; supervised classifiers trained on known conflict instances predict probable future conflicts, with risk scores correlated against Git version histories.^[20]

Section 5: Blast Radius Estimation

Blast radius in software engineering is "the potential impact that a change or failure in a system or service can have on other interconnected systems or services."^[7]^[17] It has two components: direct impact (systems immediately affected) and indirect impact (systems affected as a result of disruption to directly impacted systems). Factors influencing blast radius size include system complexity, interconnectivity level, nature of the change (interface changes carry larger blast radius than implementation-only changes), and system resilience.^[7]

Converting Conflict Detection to a Graph Intersection Problem

The pre-flight blast radius approach converts conflict detection into a computationally tractable form:^[7]^[17]

Dependency Map Analysis Techniques

AI-Powered Blast Radius: Port AI Pipeline

Analysis Type	What It Measures	Output for Pre-Flight
Create/maintain dependency map	Module-level interdependencies	Graph queryable for immediate overlap
Chain analysis	Multi-hop propagation paths through dependency chains	Transitive blast radius (not just direct neighbors)
Centrality ranking	In-degree and out-degree of modules	High-centrality modules = highest-risk overlap zones; deprioritize for concurrent assignment

Port's AI calculates blast radius pre-deployment through a five-step pipeline:^[7]^[17]

Blast-Radius.dev: PR-Level Detection Pipeline

The blast-radius.dev tool implemented a dedicated PR-level detection pipeline:^[24]

Stage	Action
Diff Parsing	Examine proposed modifications in pull requests
Change Identification	Detect API and schema modifications
Dependency Mapping	Map identified changes to related code across services
Notification	Post impact summary as PR comment

Adoption warning: The blast-radius.dev project is no longer active. The creator concluded that while the cross-service impact problem exists, "teams didn't prioritize this type of analysis at the time" — establishing an adoption barrier despite technical soundness.^[24] This contrasts sharply with Microsoft's ConE (Section 10), which achieved 90%+ user retention, suggesting that internal mandated tooling succeeds where voluntary external tooling does not.

Augment Code: Multi-Source Dependency Mapping at Scale

Augment Code's AI-powered microservices impact analysis integrates four primary data sources to capture hidden dependencies invisible to static analysis alone:^[25]

Data Source	What It Captures	Hidden Dependency Type
Static analysis	Explicit imports and function calls	Direct structural dependencies
Distributed tracing	Actual runtime request paths	HTTP calls embedded in strings; dynamic service discovery
CI/CD logs	Deployment patterns and co-deployment history	Deployment coupling invisible in code
API specifications	Service contracts and schemas	Message queue topic subscriptions; config template references

Scale: Context Engine processes 400,000+ file codebases without chunking, using models supporting 128,000 tokens of context; rebuilds dependency model within seconds of a branch push.^[25] Reported result: teams report up to 70% reduction in impact analysis time.^[25]

Blast Radius Minimization Patterns (Applicable to Agent Work Partitioning)

Section 6: Tree-Sitter and AST-Level Symbol Detection

Pattern	Mechanism	Agent Application
Bulkhead Pattern	Compartmentalize modules so failure/change in one doesn't cascade	Assign agents to isolated module bulkheads; cross-bulkhead tasks flagged
Incremental Changes	Break changes into smaller steps with intermediate verification	Decompose large agent tasks to reduce individual blast radius
Module Isolation	Minimize cross-domain dependencies in design	Pre-partition agent task assignments to aligned module boundaries

Tree-sitter is a parser generator and incremental parsing library that builds concrete syntax trees (CSTs) with full source fidelity. Its key property for conflict detection is incremental parsing: sharing unmodified tree nodes between versions enables fast re-parse when code changes (enabling source parsing on every keystroke in editors), making it viable as a live pre-flight analysis layer.^[6]^[16]

Tree-Sitter Capabilities Relevant to Conflict Detection

Capability	Mechanism	Pre-Flight Use
Language-agnostic AST	13–19+ language grammars with uniform node/edge shapes	Single dependency graph query across polyglot codebases
Incremental parsing	Shared unmodified tree nodes between versions	Real-time conflict re-evaluation as agent plans evolve
Structural querying	S-expression patterns to extract specific code structures	Extract function names, call sites, import statements for graph construction
Precise line/column mapping	All nodes mapped to exact source positions	Identifies which source position each dependency edge originates from

Critical limitation: Building dependency graphs with Tree-Sitter requires language-specific work — developers must write language-specific queries producing common captures, or hand-write AST traversers per language.^[6]^[16]

AFT: Semantic Code Operations for AI Agents

The AFT toolkit (cortexkit) is built on top of Tree-Sitter's CSTs. Every operation addresses code by what it is — function, class, call site, symbol — not by where it sits in a file.^[16] This directly addresses the root cause of line-number-based conflicts: "AI coding agents are fast, but their interaction with code is often blunt. The typical pattern: read an entire file to find one function, construct a diff from memory, apply it by line number — burning tokens on context noise, with edits that break when the file changes."^[6]

Import Graph Construction via Tree-Sitter

AFT Feature	Function	Pre-Flight Conflict Detection Role
Git Conflict Viewer	Shows all merge conflicts across repo in one call with line-numbered regions	Post-detection inventory; identifies residual direct conflicts
Symbol Resolution	Address code by name, not line number	Stable cross-agent references that don't break when files change
Call Graph Generation	Follow callers/callees across the workspace	Compute transitive impact set for any planned modification
Diff by Symbol	Generate and apply diffs at the semantic symbol level	Enables symbol-level locking (lower false-positive rate than file-level locking)

"AST beats regex": early approaches used pattern matching on source text; Tree-Sitter gives the real dependency graph, not approximations that fail on multi-line imports, aliased imports, or conditional imports.^[16]

Symbol-Level Concurrent Edit Safety

Two-agent symbol conflict check: Both agents query the same dependency graph to check for overlapping symbols. If both plan to modify the same function or class, conflict is detected pre-flight. Symbol-based locking reduces false positive conflict rates compared to file-based locking.^[6]

Architecture Anti-Patterns as High-Risk Zones

Architecture analyzers built on Tree-Sitter AST detect these patterns, which indicate zones of elevated parallel agent conflict risk:^[16]

Section 7: Code Ownership Maps as Pre-Flight Checks

Anti-Pattern	Detection Method	Risk Implication
God Classes	Classes with method count or responsibility count above threshold	Multiple agents likely to need changes in the same class
Circular Dependencies	A imports B imports A (graph cycle detection)	Change anywhere in cycle affects all participants
Leaky Abstractions	Internal implementation details in public interface	Interface changes cascade unexpectedly through callers
Spaghetti Modules	High bidirectional coupling; no clear layer boundaries	Blast radius estimation becomes unreliable

Ownership maps provide a lightweight O(1)-per-agent-pair pre-flight check that catches the most common single-file edit collisions before any dependency graph analysis is required. Files with a single clear owner can be claimed exclusively; files with shared or disputed ownership are higher-risk zones warranting deeper analysis.^[3]^[29]^[15]

Ownership Computation Methods

Method	Calculation	Best Use Case
Commit-based	Proportion of commits by developer relative to total commits for a file; "the more frequent the code changes made by a developer to a file, the higher ownership value"^[3]	Quality improvement, bug-fixing, conflict prevention — commit-based metrics appear as highest-ranked in 97% of correctly predicted defective files
Line-based	Percentage of code lines authored by developer relative to total file lines	Accountability, authorship, IP attribution — provides broader developer identification

Critical divergence finding: Only 0–40% of developers are commonly identified by both methods across studied systems. Correlation between methods ranges from 0.24–0.65. Importantly, 79% of individual code owners were NOT among the top 100 most frequent committers — significant divergence between declared and contribution-based ownership.^[3]^[29]

CODEOWNERS Files: Declared Ownership Infrastructure

CODEOWNERS is a configuration file mapping files/folders to responsible owners (teams or individuals). When a pull request touches those paths, GitHub/GitLab automatically requests review from listed owners.^[15] It encodes four types of organizational information:^[15]

CODEOWNERS Conflict Detection Mechanism

"In larger projects with multiple codeowners, merge conflicts can arise when different codeowners make changes to the same file simultaneously." Overlapping ownership rules lead to conflicts.^[15]

Ownership Quality Metrics (Codigy)

CODEOWNERS + SAST Integration

Metric	Definition	Pre-Flight Relevance
CODEOWNERS Coverage	% of codebase files mentioned in CODEOWNERS	Low coverage = blind spots for ownership-based conflict detection
Modularization Progress	% of files mapped into modules	Higher modularization = more reliable boundary-based agent partitioning
Confidence Score	% of files with engineers with significant hands-on experience	Low confidence = unreliable ownership data for conflict prediction
Lost Knowledge	Files not modified in a long time	Staleness indicator — historical ownership may no longer reflect current understanding

GitHub/GitLab CODEOWNERS integrates with Static Analysis Security Testing (SAST) triage by mapping ownership to file structures — automatically assigning suppression ownership. The same mechanism is applicable to conflict triage: file → owner → responsible agent → automatic conflict escalation routing.^[29]

Section 8: Semantic Similarity of Task Descriptions

Dependency graph analysis detects structural overlap; semantic similarity analysis detects intent overlap — two agents heading toward the same logical territory even when their initial file lists don't yet intersect. These are complementary signals: high semantic similarity without file overlap may indicate hidden future conflict; file overlap without semantic similarity may be incidental co-location rather than true conflict.

S3CDA: Two-Phase Semantic Conflict Detection

S3CDA (Supervised Semantic Similarity-based Conflict Detection Algorithm) was designed to automatically detect conflicts in software requirements — directly mappable to detecting when two AI agents have overlapping task intents.^[4]^[32]

Phase I: Similarity-Based Candidate Identification

Embedding Method	Mechanism	Relative Performance
TFIDF	Frequency-based term weighting	Best on OpenCoss dataset; weakest on semantic tasks
USE (Universal Sentence Encoder)	Pre-trained 512-dimensional vectors	Best on UAV dataset (92.3% F1)
SBERT	Semantic-aware embeddings capturing contextual meaning	Best performer overall — recommended default
SBERT-TFIDF	Hybrid: semantic + frequency signals	Best on WorldVista (87.1% F1)

Similarity formula: cos(r₁,r₂) = r₁·r₂ / (‖r₁‖ ‖r₂‖), ranging from -1 (dissimilar) to 1 (identical). Optimal thresholds determined via ROC curves per dataset.^[32]

Phase II: Entity Overlap Validation

High-similarity candidate pairs enter Phase II for entity extraction and overlap ratio calculation:^[32]

Entity Extraction Method	Structure	Entities Extracted
POS Tagging	Actor + Action + Object + Resource	Nouns and verbs from requirement/task text
Software-specific NER (S-NER)	Transformer-based extraction	Actors, actions, objects, properties, metrics, operators

Overlap ratio computed against m=5 most similar candidates; if ratio exceeds threshold T₀=1.0, pair enters the final conflict set.^[32]

S3CDA Evaluation Results

Dataset	Best Embedding	F1-Score
PURE	SBERT	89.6%
UAV	USE	92.3%
WorldVista	SBERT-TFIDF	87.1%
OpenCoss	TFIDF	57.0%

LLM comparison: S3CDA consistently outperforms GPT-4o, Llama-3, Sonnet-3.5, and Gemini-1.5 in domain-specific settings. LLMs show promise on general datasets but fall short in specialized domains.^[4]^[32] For high-recall requirements, the unsupervised variant UnSupCDA achieves 100% recall across most datasets at the cost of lower precision.^[32]

Applying S3CDA to Agent Task Conflict Detection

This provides a lightweight, fast pre-flight screen before any dependency graph analysis — task description similarity check can run in milliseconds and serves as a Layer 0 gate before triggering more expensive structural analysis.^[32]

Semantic Consensus Framework (SCF): Production Multi-Agent Systems

Production multi-agent LLM systems show failure rates between 41–86.7%, with nearly 79% of failures originating from specification and coordination issues — not model capability limitations. The root cause is Semantic Intent Divergence: cooperating LLM agents develop inconsistent interpretations of shared objectives due to siloed context.^[18]^[28]

SCF Six Components

SCF Three Conflict Categories

SCF Evaluation Results (600 runs across AutoGen, CrewAI, LangGraph)

Component	Function
Process Context Layer	Establishes shared operational semantics across all agents
Semantic Intent Graph	Formal graph representation of agent intentions
Conflict Detection Engine	Real-time identification of contradictory, contention-based, and causally invalid intent combinations
Consensus Resolution Protocol	Policy-authority-temporal hierarchy for dispute resolution
Drift Monitor	Detects gradual semantic divergence over time
Process-Aware Governance Integration	Enforces organizational policy compliance

Category	Definition	Detection Mechanism
Contradictory	Agent intents directly oppose each other	Semantic Intent Graph polarity analysis
Contention-based	Agents compete for the same resource/file/function	Resource node conflict in Semantic Intent Graph
Causally invalid	An agent's intent violates causal dependencies established by another	Process model valid transition verification

Metric	SCF	Best Baseline
Workflow completion rate	100%	25.1%
Semantic conflict detection rate	65.2%	N/A (not reported)
Detection precision	27.9%	N/A
Protocol compatibility	MCP and A2A	—

SCF also defines a Semantic Alignment Score (SAS) per agent pair, combining: (1) overlap between each agent's entity state model, (2) consistency of planned actions with the process model's valid transitions, and (3) divergence between agent confidence levels and historical base rates. SAS provides a scalar conflict risk indicator that complements the binary conflict categories.^[28]

Section 9: ML-Based Conflict Prediction

Social + Technical Features from Git History

A large-scale study (Springer Empirical Software Engineering) evaluated machine learning on git history features for binary merge conflict prediction across 744 open-source GitHub repositories across 7 programming languages — described as the largest merge conflict prediction study to date.^[21]

Feature Categories

Model Performance

Feature Category	Features	Key Finding
Technical — Structural	Relation of modularity (MVC layers) to conflict frequency	Cross-layer changes are significantly more conflict-prone than same-layer changes
Technical — Size	Size of code changes (lines added/deleted)	Larger changes correlate with more conflicts
Technical — Timing	Branch age; timing of code changes	Long-lived branches cause disproportionately more conflicts
Social — Role	Developer roles and contribution patterns	Top contributors at project level cause more conflicts
Social — Pattern	Contributor frequency at merge-scenario level	Occasional contributors at merge level cause more conflicts
Social — Combined	Top project contributor + occasional merge contributor simultaneously	32.31% conflict probability for this specific combination

Model Type	Features Used	Accuracy	Recall
Technical only	Structural, size, timing	~0.80	~0.85
Social only	Role, pattern, combined	~0.75	~0.90
Combined (best)	Social + technical, Random Forest	0.92	1.00

Class imbalance note: Merge conflict data from git history is highly imbalanced (far more non-conflicting merges). Handling requires SMOTE, Random Forest ensemble, or class-weighted training.^[21]

MergeBERT: Neural Transformer for Conflict Resolution

A neural program merge framework based on token-level three-way differencing and a multi-input BERT variant:^[21]

Facebook's Predictive Test Selection: ML + Build Dependency Graphs at Scale

Facebook developed a ML system shifting from "which tests could be affected" to "what's the probability a test will catch a regression?"^[31]

Component	Method	Role
Build dependency analysis	All tests transitively depending on modified code	Candidate set generation
ML probability scoring	Gradient-boosted decision trees	Estimate likelihood each test detects a regression
Graph distance	Distance in build dependency graph between changed units and tests	Key feature: empirically, changed code and failing tests have small graph distance

Production results: Detects 99.9% of regressions while running only 1/3 of all dependent tests; requires 95%+ prediction accuracy; achieved 2x testing infrastructure efficiency gains.^[31]

RIPPLE: Intent-Aware Change Impact (ICSE 2026)

RIPPLE ("From Seed to Scope: Reasoning to Identify Change Impact Sets," Yadavally and Nguyen, ICSE 2026) addresses the precision-recall tradeoff in change impact analysis with a two-phase design.^[27]

RIPPLE Evaluation Results

Phase	Focus	Method	Output
Phase 1 — Seed-to-Scope	Recall-focused	Combines evolutionary coupling (commit history) + dependence coupling (structural/semantic); progressively expands impact set from seed edit	Wide-net candidate impact set
Phase 2 — Plan-Then-Predict	Precision-focused	Planner LLM produces change plan via Chain-of-Thought; Reasoner LLM performs impact estimation per dependence cluster (localized to mitigate hallucinations)	Precision-filtered impact set aligned with change intent

Metric	Value	Interpretation
Hit@K	86%	In 86% of commits, at least one co-changing location is structurally/semantically dependent on the seed edit
F1-score improvement	39.7%–380.8% over baselines	Versus existing top-down and bottom-up CIA approaches
Evolutionary coupling unique contribution	21% of commits	In 21% of commits, evolutionary coupling identifies locations that dependence coupling alone misses

RIPPLE's key bridge: natural language intent → dependence-expanded impact set transforms task descriptions into concrete file sets comparable before any agent starts working.^[27]

Section 10: Production Concurrent Edit Detection Systems

ConE: Microsoft's Concurrent Edit Detection at Scale

ConE is a production-deployed concurrent edit detection service at Microsoft (ACM TOSEM 2022), deployed March 2020 onwards across 234 repositories.^[10]

Empirical foundation: Files concurrently edited in different pull requests are more likely to introduce bugs — established from half a year of changes across 6 large Microsoft repositories, each with 1,000+ monthly PRs.^[10]

ConE Core Metrics

ConE Deployment Results

ConE Key Insights for Agent Conflict Detection

Adoption: ConE vs. Blast-Radius.dev

Metric	Definition	Design Decision
Extent of Overlap (EOO)	Percentage value representing overlap between two PRs active at the same time; measures file-level overlap	Scalar conflict potential score (not binary); deliberately lightweight — avoids time-consuming deep semantic analysis
Rarely Concurrently Edited (RCE) Files	Files infrequently modified together with other files; historical co-edit frequency as prior	Concurrent edits to RCE files = special warning signal; files always edited together = expected concurrent edits (low alert)

Metric	Value
Repositories covered	234 across different product lines at Microsoft
Pull requests evaluated	26,000
Recommendations made	775 about conflicting changes
Rated useful by developers	Over 70% (554 cases)
Users intending to keep daily use	Over 90% of 48 interviewed users
Patent	Google Patent WO2022031338A1

Insight	Agent System Application
EOO is directly applicable	Agent tasks ≈ PRs; EOO of planned file modifications provides pre-dispatch conflict score
RCE concept	Historical co-edit data is a strong prior — files rarely co-modified are higher-risk when two agents plan concurrent modification
Lightweight heuristics beat precision	Fast, scalable overlap estimation is the right tradeoff for pre-flight checks in high-velocity systems
Adaptive thresholds	Different codebases have different expected overlap patterns — adaptive thresholds prevent alert fatigue

System	Deployment Model	User Retention	Outcome
ConE (Microsoft)	Internally mandated; integrated into Azure DevOps workflows	90%+ intended daily use	Production success; patented
Blast-radius.dev	External tool requiring voluntary adoption	N/A (project discontinued)	Technically sound but no market adoption

The contrast suggests that deployment model (mandated internal integration vs. voluntary external tool) determines adoption success more than technical merit for conflict detection tooling.^[10]^[24]

Palantír: Push-Based Workspace Awareness (Foundational Reference)

Palantír (2012, IEEE TSE) remains the foundational reference for real-time parallel development conflict detection. Its push-based workspace awareness architecture — where API diffs are transmitted across workspaces as work progresses — directly models the architecture needed for multi-agent pre-flight systems.^[8]^[19]

2025–2026 Multi-Agent Industry Practice

By end of 2025, approximately 85% of developers regularly used AI tools for coding, still mostly single-agent; multi-agent coordination became the new frontier in early 2026.^[30] The key upfront conflict-detection pattern that emerged in this period is mandatory plan approval before implementation: agents write plans specifying files they intend to modify, a lead agent reviews for overlap, and approves or rejects before any code is written — catching collision at the intent layer rather than the diff layer.^[30]

Section 11: Detection Method Comparison and Selection Guide

No single technique covers the full conflict space. A complete pre-flight system is a layered pipeline where cheap, fast techniques filter the candidate space before expensive, precise techniques are applied only to flagged pairs.

Method	Conflict Type Detected	Precision	Recall	Latency	Requires Codebase Execution	Source
File-level overlap (EOO)	Direct only	Medium	High (for direct)	Milliseconds	No	^[10]
Ownership map check	Direct + organizational	Medium	Medium	Milliseconds	No	^[3]^[15]
Semantic task similarity (SBERT)	Intent overlap	Medium (87–92% F1)	High (100% for UnSupCDA)	Seconds	No	^[4]^[32]
Import/dependency graph	Direct + indirect (structural)	Medium-high	High	Seconds–minutes	No	^[11]^[9]
CHA call graph traversal	Direct + indirect (call paths)	Low-medium (over-approx)	Highest	Seconds–minutes	No	^[2]^[13]
Program slicing (forward)	Direct + data flow ripple	Medium-high	High (97% F1 for NS-Slicer)	Seconds–minutes	No	^[5]
ML from git history (Random Forest)	All types (probabilistic)	N/R (Accuracy: 0.92)^*	1.00	Milliseconds (inference)	No (requires training)	^[21]
Static semantic analysis (4 algorithms)	Semantic (data flow / confluence)	0.43	0.60 (vs. 0.14 dynamic)	17.8s median	No	^[1]^[12]
RIPPLE intent-aware CIA	All types (intent + structure)	High (86% Hit@K)	High (+39–381% vs. baselines)	LLM inference time	No	^[27]
SCF semantic intent graph	Contradictory + contention + causal	27.9%	65.2%	Real-time	No	^[28]

Layer	Technique	Trigger	Action on Flag
Layer 0 — Instant	File-level overlap (EOO) + CODEOWNERS check	At task dispatch time	Immediate sequential routing for direct conflicts
Layer 1 — Fast	Semantic task similarity (SBERT cosine)	All non-blocked pairs from Layer 0	Flag intent-similar pairs for deeper analysis
Layer 2 — Structural	Import/dependency graph traversal + CHA call graph	Pairs flagged by Layer 1	Compute blast radius intersection; flag overlapping sets
Layer 3 — Semantic	Static semantic analysis (DF/CF/OA/PDG) or RIPPLE	Pairs with structural overlap from Layer 2	Confirm semantic conflict; generate specific conflict report
Layer 4 — Historical	ML from git history (Random Forest) + RCE scoring	Continuous scoring of all pairs	Probabilistic conflict risk score for routing decisions

Gap	Description	Evidence of Gap
Agent-native pre-flight benchmarks	All evaluated systems target human developer workflows (PRs, branches). No published benchmarks for AI agent-specific pre-flight conflict detection with agent think-time, task description length, or agent velocity as variables.	All primary sources (Palantír, ConE, S3CDA) use human developer datasets
Dynamic language dependency accuracy	Import graph analysis and call graph construction are inherently more imprecise for dynamically typed languages (Python, JavaScript). The static analysis technique explicitly excludes Java reflection and native methods. No dynamic language benchmark published.	^[1]^[22] limitation sections
Real-time update cost at agent velocity	Augment Code rebuilds its dependency model "within seconds of a branch push" but at agent velocity (dozens of parallel agents committing continuously), the cost and consistency of real-time dependency graph updates is not studied.	^[25] reports reconstruction time but not under concurrent write load
SCF precision at scale	SCF achieves 65.2% detection at 27.9% precision across 600 runs on AutoGen/CrewAI/LangGraph. Performance at 10x+ agent count, with heterogeneous task descriptions, is not characterized.	^[28] limited to 600 runs
Ownership map freshness	Commit-based and line-based ownership calculations are point-in-time snapshots. No literature addresses how frequently ownership must be recalculated for rapidly evolving AI-augmented codebases.	^[3]^[29] report static calculations only
Cross-language blast radius	Polyglot codebases (e.g., Python backend + TypeScript frontend + Go services) require cross-language dependency edges for accurate blast radius. No published system handles this end-to-end.	^[20] addresses the problem conceptually but no evaluated implementation