Skip to main content
SoulForge supports two compaction strategies for managing long conversations. When context usage exceeds a threshold, older messages are compacted to free space while preserving critical information.

Strategies

Maintains a WorkingStateManager that extracts structured state as the conversation happens, not in a batch at compaction time.Extracted deterministically (zero LLM cost):
  • Files — tracked from read/edit/write tool calls with action details
  • Failures — extracted from error results
  • Tool results — rolling window of shell/grep/project outputs
  • Task — set from first user message
Extracted via regex (zero LLM cost):
  • Decisions — patterns like “I’ll use…”, “decided to…”, “because…”
  • Discoveries — patterns like “found that…”, “the issue was…”
On compaction:
  1. Serializes the pre-built structured state into markdown
  2. Optionally runs a cheap LLM gap-fill pass (2048 tokens max) that only outputs what’s missing
  3. Same message replacement as V1
Cost: Rule-based extraction during conversation (free). Gap-fill ~2k tokens vs V1’s 8k.

Configuration

{
  "compaction": {
    "strategy": "v2",
    "triggerThreshold": 0.7,
    "resetThreshold": 0.4,
    "keepRecent": 4,
    "maxToolResults": 30,
    "llmExtraction": true
  }
}
All fields are optional. Omitting compaction or strategy defaults to V2.
FieldDefaultDescription
strategy"v2""v2" (incremental extraction) or "v1" (LLM summarization)
triggerThreshold0.7Auto-compact at this percentage of context usage
resetThreshold0.4Hysteresis reset to prevent oscillation
keepRecent4Number of recent messages to preserve verbatim
maxToolResults30Rolling window for tool result slots (V2 only)
llmExtractiontrueEnable cheap LLM gap-fill on compact (V2 only)

Live toggle

Use /compaction to switch strategies. The change takes effect immediately — switching to V2 starts extraction on the next message, switching to V1 drops the working state entirely.

Dedicated model

Both strategies use the task router’s compact slot:
{
  "taskRouter": {
    "compact": "google/gemini-2.0-flash"
  }
}
For V2, only the gap-fill pass uses this model. For V1, the full summarization uses it.

Data flow (V2)

User message ------------------> extractFromUserMessage()  --> WSM.task
Tool call (read/edit/shell) ----> extractFromToolCall()     --> WSM.files, WSM.toolResults
Tool result (success/error) ----> extractFromToolResult()   --> WSM.toolResults, WSM.failures
Assistant text -----------------> extractFromAssistantMessage() -> WSM.decisions, WSM.discoveries
                                                                    |
Context > threshold --> buildV2Summary() --> serialize WSM          |
                              |               + optional gap-fill <-+
                              v
                    [summary msg] + [ack msg] + [N recent msgs]

Real-world example

Before compaction

MetricValue
Core messages34
Prompt tokens4,517,349
Cache read tokens2,740,557 (60.6% hit rate)
Context utilization6%

After V2 compaction

MetricValue
Core messages5
Prompt tokens7,539
Gap-fill tokens0 (WSM had 15+ slots, skipped)
Context utilization4%
34 messages to 5. The compaction cost zero tokens — no LLM call at all.

What V2 produced

## Task
(all user requests concatenated)

## User Requirements
- fix all issues
- run tests, lint, typecheck format and commit
- ...9 items total

## Files Touched
- `tsconfig.json`: read; edited (x2)
- `src/core/tools/web-search.ts`: read (x4); analyzed (x3); edited
- `README.md`: read (x2); edited
- `src/core/tools/project.ts`: read (x9); grep; edited

## Tool Results
- **soul_analyze**: Top 20 identifiers by cross-file reference count...
- **rename_symbol**: Renamed across 5 files [lsp], verified zero remaining
- **project**: typecheck passed, lint passed, 2080 tests passed

## Errors & Failures
- project: typecheck failed -- TS5090 non-relative paths
- project: lint failed -- formatter would have printed different content

Gap-fill threshold

When 15 or more slots are populated across all categories (task, plan, files, decisions, failures, discoveries, environment, toolResults, userRequirements, assistantNotes), the LLM gap-fill is skipped entirely. Sessions with fewer tool calls trigger the 2K-token gap-fill to capture reasoning from prose.

V1 comparison

The same compaction with V1 would have:
  • Sent all 34 messages to an LLM for summarization
  • Cost ~8K output tokens
  • Taken 5-15 seconds of latency
  • Produced a prose summary that captures reasoning better but loses structured data
V2’s tradeoff: Zero cost, instant, structured data preserved, but reasoning chains truncated. For mechanical coding sessions (fix/edit/test cycles), V2 is strictly better. For design-heavy sessions where the “why” matters more, V1’s LLM summarization may retain more nuance.

Visual indicators

  • ContextBar: Shows v2:N (slot count) when V2 is active
  • ContextBar: Shows compacting spinner during active compaction
  • InputBox: Shows “Compacting context…” status during compaction
  • System message: Reports strategy used and before/after context percentages

Architecture

src/core/compaction/
+-- types.ts           -- WorkingState, CompactionConfig, slot types
+-- working-state.ts   -- WorkingStateManager class
+-- extractor.ts       -- Rule-based extractors for tool calls and messages
+-- summarize.ts       -- buildV2Summary() with optional LLM gap-fill
+-- index.ts           -- barrel exports