Strategies
- V2 -- Incremental extraction (default)
- V1 -- LLM batch summarization
Maintains a
WorkingStateManager that extracts structured state as the conversation happens, not in a batch at compaction time.Extracted deterministically (zero LLM cost):- Files — tracked from read/edit/write tool calls with action details
- Failures — extracted from error results
- Tool results — rolling window of shell/grep/project outputs
- Task — set from first user message
- Decisions — patterns like “I’ll use…”, “decided to…”, “because…”
- Discoveries — patterns like “found that…”, “the issue was…”
- Serializes the pre-built structured state into markdown
- Optionally runs a cheap LLM gap-fill pass (2048 tokens max) that only outputs what’s missing
- Same message replacement as V1
Configuration
compaction or strategy defaults to V2.
| Field | Default | Description |
|---|---|---|
strategy | "v2" | "v2" (incremental extraction) or "v1" (LLM summarization) |
triggerThreshold | 0.7 | Auto-compact at this percentage of context usage |
resetThreshold | 0.4 | Hysteresis reset to prevent oscillation |
keepRecent | 4 | Number of recent messages to preserve verbatim |
maxToolResults | 30 | Rolling window for tool result slots (V2 only) |
llmExtraction | true | Enable cheap LLM gap-fill on compact (V2 only) |
Live toggle
Use/compaction to switch strategies. The change takes effect immediately — switching to V2 starts extraction on the next message, switching to V1 drops the working state entirely.
Dedicated model
Both strategies use the task router’scompact slot:
Data flow (V2)
Real-world example
Session with 10 user turns (fix/edit/test cycle)
Session with 10 user turns (fix/edit/test cycle)
Before compaction
| Metric | Value |
|---|---|
| Core messages | 34 |
| Prompt tokens | 4,517,349 |
| Cache read tokens | 2,740,557 (60.6% hit rate) |
| Context utilization | 6% |
After V2 compaction
| Metric | Value |
|---|---|
| Core messages | 5 |
| Prompt tokens | 7,539 |
| Gap-fill tokens | 0 (WSM had 15+ slots, skipped) |
| Context utilization | 4% |
What V2 produced
Gap-fill threshold
When 15 or more slots are populated across all categories (task, plan, files, decisions, failures, discoveries, environment, toolResults, userRequirements, assistantNotes), the LLM gap-fill is skipped entirely. Sessions with fewer tool calls trigger the 2K-token gap-fill to capture reasoning from prose.V1 comparison
The same compaction with V1 would have:- Sent all 34 messages to an LLM for summarization
- Cost ~8K output tokens
- Taken 5-15 seconds of latency
- Produced a prose summary that captures reasoning better but loses structured data
V2’s tradeoff: Zero cost, instant, structured data preserved, but reasoning chains truncated. For mechanical coding sessions (fix/edit/test cycles), V2 is strictly better. For design-heavy sessions where the “why” matters more, V1’s LLM summarization may retain more nuance.
Visual indicators
- ContextBar: Shows
v2:N(slot count) when V2 is active - ContextBar: Shows compacting spinner during active compaction
- InputBox: Shows “Compacting context…” status during compaction
- System message: Reports strategy used and before/after context percentages