Skip to main content
The repo map replaces flat file tree listings with a graph-ranked, context-adaptive view. Read the full story for the motivation behind this approach.

How it works

1. Index phase (startup)

Walk the file tree (respecting .gitignore, max depth 10). For each source file:
  1. Parse with tree-sitter to extract symbols (functions, classes, interfaces, types, enums) and their signatures
  2. Extract import statements and identifier references
  3. Build cross-file edges: if file A references a symbol exported by file B, create an edge A to B
  4. Store everything in SQLite (in-memory for speed, on-disk for persistence)
Indexing is incremental — files are re-indexed only when their mtime changes.

2. Graph phase

Run PageRank (20 iterations, damping factor 0.85) over the file-to-file edge graph. Files imported by many other files score higher. The algorithm uses a personalization vector that can be biased toward specific files.

3. Co-change phase

Parse git log --name-only for the last 300 commits. For each commit that touches 2-20 files, record all pairwise file combinations. Commits with >20 files are filtered as noise (refactors, mass renames). This captures implicit coupling that the import graph misses — files that are always edited together even without direct imports.

4. Ranking phase (per-turn)

PageRank with personalized restart vector:
  • Edited files: 5x base weight
  • Mentioned files (tool reads, grep hits): 3x base weight
  • Active editor file: 2x base weight
  • Co-change partners of context files: proportional to co-change count (capped at 2x)
Post-hoc signals (things PageRank can’t capture):
  • FTS match on conversation terms: +0.5 score
  • Graph neighbor of any context file: +1.0 score
  • Co-change partner of any context file: +min(count/5, 3.0) score

5. Rendering phase

Binary search to maximize the number of file blocks that fit within the token budget. Each block shows:
src/core/agents/agent-bus.ts [R:12]
  +AgentBus -- Shared coordination bus for parallel subagent communication
  +acquireFileRead -- Lock-free file read with cache and waiter pattern
  +SharedCache -- Pre-seeded cache for warm agent starts
   FileCacheEntry
  • + = exported symbol
  • [R:12] = blast radius (12 files import this one)
  • [NEW] = file appeared since last render
  • Descriptions are semantic summaries from the LLM

6. Semantic summaries

After the initial scan, top symbols (by PageRank) are batched to a fast LLM:
Prompt: "One-line summary of what this symbol does."
Input: { name: "AgentBus", kind: "class", signature: "class AgentBus { ... }" }
Output: "Shared coordination bus for parallel subagent communication"
Summaries are cached in SQLite keyed by (symbol_id, file_mtime). When a file is edited, its mtime changes and summaries are regenerated.

Budget dynamics

The repo map’s token budget scales inversely with conversation length:
StateBudgetRationale
Start of conversation2,500 tokensAI needs maximum orientation
Mid conversation (~50K tokens)~2,000 tokensContext established, less map needed
Late conversation (~100K+ tokens)1,500 tokensSave space for actual work

Real-time updates

Tool calls edit_file / write_file
    |
emitFileEdited(absPath)
    |
ContextManager.onFileChanged(absPath)
    |
RepoMap.onFileChanged(absPath)
    |
Mark file dirty -> debounced re-index (500ms)
    |
Re-extract symbols + edges -> recompute PageRank
    |
Clear repo map render cache
    |
Next system prompt gets updated ranking
emitFileRead(absPath) feeds into trackMentionedFile(), which boosts the file in the next PageRank personalization without re-indexing.

Schema

CREATE TABLE files (
  id INTEGER PRIMARY KEY,
  path TEXT UNIQUE NOT NULL,
  mtime_ms REAL NOT NULL,
  language TEXT NOT NULL,
  line_count INTEGER NOT NULL DEFAULT 0,
  symbol_count INTEGER NOT NULL DEFAULT 0,
  pagerank REAL NOT NULL DEFAULT 0
);

CREATE TABLE symbols (
  id INTEGER PRIMARY KEY,
  file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  name TEXT NOT NULL,
  kind TEXT NOT NULL,
  line INTEGER NOT NULL,
  signature TEXT,
  is_exported INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE edges (
  source_file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  target_file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  weight REAL NOT NULL DEFAULT 1.0,
  PRIMARY KEY (source_file_id, target_file_id)
);

CREATE TABLE cochanges (
  file_id_a INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  file_id_b INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  count INTEGER NOT NULL DEFAULT 1,
  PRIMARY KEY (file_id_a, file_id_b)
);

CREATE TABLE semantic_summaries (
  symbol_id INTEGER PRIMARY KEY REFERENCES symbols(id) ON DELETE CASCADE,
  summary TEXT NOT NULL,
  file_mtime REAL NOT NULL
);

-- FTS5 for conversation-term matching
CREATE VIRTUAL TABLE symbols_fts USING fts5(name, content=symbols, content_rowid=id);

Comparison

FeatureSoulForgeAiderClaude Code
Index methodtree-sitter ASTtree-sitter ASTNone
RankingPageRank + personalizationGraph rankingN/A
Context adaptationPer-turn personalization vectorDynamic sizingN/A
Git co-changeYes (300 commits)NoNo
Semantic summariesLLM-generated, cached by mtimeNoNo
FTS on symbolsYes (SQLite FTS5)NoNo
Real-time updatesDebounced re-index on editPer-turn rescanN/A
Blast radius tagsYes ([R:N])NoNo
Clone detectionAST shape hash + MinHashNoNo
Dead code detectionUnused exports (dead vs unnecessary)NoNo
See also: Code intelligence for the LSP and tree-sitter fallback chain that powers symbol extraction.

Language support

Convention-based visibility detection for 33 languages:
ConventionLanguagesRule
Name-basedGoCapitalized = public
Name-basedPython, DartNo _ prefix = public
Name-basedElispNo -- prefix = public
Keyword-basedRust, Zigpub keyword
Keyword-basedJava, Kotlin, Swift, C#, ScalaNot private = public
Keyword-basedPHPNot private/protected = public
Keyword-basedElixirdef (not defp) = public
Keyword-basedSoliditypublic/external or contract/event/struct
File-basedC, C++, Objective-CHeader file (.h/.hpp/.hh) = public
Default publicRuby, Lua, Bash, OCaml, ReScript, TLA+All top-level symbols treated as public
Identifier extraction covers three naming families:
  • camelCase + PascalCase: TypeScript, JavaScript, Go, Rust, Java, Kotlin, Swift, C#, Dart, Scala, Objective-C, Solidity
  • snake_case + PascalCase: Python, Ruby, Elixir, PHP, C, C++, Zig, Lua, Bash, OCaml, ReScript
  • Hyphenated: Elisp

Monorepo support (partial)

The repo map indexes files within the working directory. In monorepo setups:
  • PageRank, blast radius, and dependency edges only span files within cwd
  • Cross-package imports resolve as external dependencies, not internal edges
  • Co-change analysis works across the full git history regardless of package boundaries
  • The project tool handles workspace discovery separately via project(action: "list")