How it works
1. Index phase (startup)
Walk the file tree (respecting.gitignore, max depth 10). For each source file:
- Parse with tree-sitter to extract symbols (functions, classes, interfaces, types, enums) and their signatures
- Extract import statements and identifier references
- Build cross-file edges: if file A references a symbol exported by file B, create an edge A to B
- Store everything in SQLite (in-memory for speed, on-disk for persistence)
2. Graph phase
Run PageRank (20 iterations, damping factor 0.85) over the file-to-file edge graph. Files imported by many other files score higher. The algorithm uses a personalization vector that can be biased toward specific files.3. Co-change phase
Parsegit log --name-only for the last 300 commits. For each commit that touches 2-20 files, record all pairwise file combinations. Commits with >20 files are filtered as noise (refactors, mass renames).
This captures implicit coupling that the import graph misses — files that are always edited together even without direct imports.
4. Ranking phase (per-turn)
PageRank with personalized restart vector:- Edited files: 5x base weight
- Mentioned files (tool reads, grep hits): 3x base weight
- Active editor file: 2x base weight
- Co-change partners of context files: proportional to co-change count (capped at 2x)
- FTS match on conversation terms: +0.5 score
- Graph neighbor of any context file: +1.0 score
- Co-change partner of any context file: +min(count/5, 3.0) score
5. Rendering phase
Binary search to maximize the number of file blocks that fit within the token budget. Each block shows:+= exported symbol[R:12]= blast radius (12 files import this one)[NEW]= file appeared since last render- Descriptions are semantic summaries from the LLM
6. Semantic summaries
After the initial scan, top symbols (by PageRank) are batched to a fast LLM:(symbol_id, file_mtime). When a file is edited, its mtime changes and summaries are regenerated.
Budget dynamics
The repo map’s token budget scales inversely with conversation length:| State | Budget | Rationale |
|---|---|---|
| Start of conversation | 2,500 tokens | AI needs maximum orientation |
| Mid conversation (~50K tokens) | ~2,000 tokens | Context established, less map needed |
| Late conversation (~100K+ tokens) | 1,500 tokens | Save space for actual work |
Real-time updates
emitFileRead(absPath) feeds into trackMentionedFile(), which boosts the file in the next PageRank personalization without re-indexing.
Schema
Comparison
| Feature | SoulForge | Aider | Claude Code |
|---|---|---|---|
| Index method | tree-sitter AST | tree-sitter AST | None |
| Ranking | PageRank + personalization | Graph ranking | N/A |
| Context adaptation | Per-turn personalization vector | Dynamic sizing | N/A |
| Git co-change | Yes (300 commits) | No | No |
| Semantic summaries | LLM-generated, cached by mtime | No | No |
| FTS on symbols | Yes (SQLite FTS5) | No | No |
| Real-time updates | Debounced re-index on edit | Per-turn rescan | N/A |
| Blast radius tags | Yes ([R:N]) | No | No |
| Clone detection | AST shape hash + MinHash | No | No |
| Dead code detection | Unused exports (dead vs unnecessary) | No | No |
Language support
Convention-based visibility detection for 33 languages:| Convention | Languages | Rule |
|---|---|---|
| Name-based | Go | Capitalized = public |
| Name-based | Python, Dart | No _ prefix = public |
| Name-based | Elisp | No -- prefix = public |
| Keyword-based | Rust, Zig | pub keyword |
| Keyword-based | Java, Kotlin, Swift, C#, Scala | Not private = public |
| Keyword-based | PHP | Not private/protected = public |
| Keyword-based | Elixir | def (not defp) = public |
| Keyword-based | Solidity | public/external or contract/event/struct |
| File-based | C, C++, Objective-C | Header file (.h/.hpp/.hh) = public |
| Default public | Ruby, Lua, Bash, OCaml, ReScript, TLA+ | All top-level symbols treated as public |
- camelCase + PascalCase: TypeScript, JavaScript, Go, Rust, Java, Kotlin, Swift, C#, Dart, Scala, Objective-C, Solidity
- snake_case + PascalCase: Python, Ruby, Elixir, PHP, C, C++, Zig, Lua, Bash, OCaml, ReScript
- Hyphenated: Elisp
Monorepo support (partial)
The repo map indexes files within the working directory. In monorepo setups:- PageRank, blast radius, and dependency edges only span files within
cwd - Cross-package imports resolve as external dependencies, not internal edges
- Co-change analysis works across the full git history regardless of package boundaries
- The
projecttool handles workspace discovery separately viaproject(action: "list")