When a provider is overloaded, rate-limited, or returning 5xx, SoulForge can switch to a backup model and keep streaming. The fallback chain is per-model — each model in use can have its own ordered list of alternates.Documentation Index
Fetch the complete documentation index at: https://soulforge.proxysoul.com/llms.txt
Use this file to discover all available pages before exploring further.
Open the picker
Or edit the config
How it cycles
- Transient error retries first.
maxTransientRetries(default 3) attempts on the active model with exponential backoff + jitter. - Swap. Budget exhausted → next model in the chain. A
Switched to fallback model: Xsystem message lands in chat. The header and active-model header sync to the new model. - Cycle. When the chain is exhausted, the loop returns to the primary and starts over. Capped at 3 cycles, then the original error is rethrown with
cause. - User abort wins. Ctrl+X always exits the loop immediately, regardless of where you are in the chain.
What counts as transient
Matched against the error message:overloaded, 429, 529, 503, 502, rate limit, too many requests, timeout, fetch failed, econnreset, econnrefused, enotfound, socket hang up, premature close, stream error/closed, connection error/reset/refused/closed, aborted.*connection.
403 is only treated as transient when the response body also contains overloaded or rate — a genuine auth-rejection 403 will not burn the retry budget.
Tune the budgets
maxTransientRetries— retries per model before swapping to the next fallback.maxStallRetries— retries when the stream stalls (watchdog only; opt-in).baseDelayMs— first retry delay; doubles each attempt with jitter. Clamped to 250–60000ms.
maxAttempts is still accepted as a default for both.
When fallback shines
- Anthropic overload. Sonnet 4.6 returns
529 overloaded. Chain to Opus 4.6 or to a different gateway. - One key rate-limited. First-party Anthropic + LLM Gateway with separate quotas.
- Provider-specific outage. Mix providers (
anthropic/…+openai/…+google/…) so a single-vendor incident doesn’t end your session.
What it won’t do
- Cost-based switching. Fallback is failure-driven, not budget-driven. Use Task Router slots for that.
- Stall detection. Stalls (no chunks, no abort) hit the watchdog path, which retries on the same model. Fallback only triggers on thrown transient errors.
- Permanent errors. 401, malformed request, model not found, real 403 — these surface immediately. The chain is for transient failures only.

