Your agent failed at step 30. rewind fix diagnosed and repaired it in 12 seconds.
My agent failed at step 5. Again.
I opened the trace, scrolled through the context windows, stared at the token counts, and eventually figured out that the model hallucinated because the search API returned stale data due to a rate limit. Took me 20 minutes. Then I changed the prompt, re-ran the entire agent, and… different failure, different step. Non-deterministic LLMs are fun like that.
The debugging loop for AI agents is brutal: read the trace, guess what went wrong, change something, re-run the whole thing, compare manually. Every iteration costs tokens, time, and patience.
So I built a command that does this entire loop automatically.
rewind fix
rewind fix latest
One command. Rewind loads the failed session, extracts the full context around the failure, and sends it to an LLM for diagnosis. The LLM returns a structured result: what went wrong, which step failed, and what to change.
⏪ Diagnosing session "research-agent-demo" (5 steps)...
Failure: Step 5 — llmcall (gpt-4o) — error
Error: HALLUCINATION: Agent used stale 2019 projection as current fact
Root cause: The agent relied on outdated data due to a search API rate
limit, leading to incorrect population figures.
Suggested fix: retry_step
Confidence: high
To apply this fix automatically:
rewind fix latest --apply
The fix loop — automated
The diagnosis is useful on its own. But the real power is --apply:
rewind fix latest --apply --yes --command "python agent.py"
This does everything:
- Diagnoses the failure with an LLM
- Forks the timeline at the step before the failure
- Starts the proxy with request rewrites (model swap, system prompt injection, etc.)
- Runs your agent against the patched proxy
- Reports savings — tokens, cost, and time saved
Steps before the fork are served from cache (0 tokens, 0ms). Only the fixed steps hit the LLM.
Five fix types
The diagnosis LLM can suggest any of these:
| Fix | When it helps |
|---|---|
| Model swap | Context window overflow, capability mismatch — switch to a better model |
| System prompt injection | Missing guardrails — add instructions to prevent hallucination |
| Temperature adjustment | Too much randomness — reduce temperature for consistency |
| Retry | Non-deterministic failure — same request, different roll of the dice |
| No fix | Agent code issue — the LLM parameters aren’t the problem |
Skip the AI — test your own theory
If you already know what to try, skip the diagnosis entirely:
rewind fix latest --hypothesis "swap_model:gpt-4o" --apply --yes --command "python agent.py"
This directly forks and replays with your specified fix — no LLM call for diagnosis, no waiting. Power users love this.
How it works under the hood
The proxy already sits between your agent and the LLM provider. In --apply mode, it does something new: request rewriting. After the fork point, the proxy modifies the outgoing request body before forwarding it upstream.
For a model swap, it changes the model field. For system prompt injection, it prepends a message (OpenAI format) or sets the system field (Anthropic format — both string and array with prompt caching). For temperature, it overrides the parameter.
The rewrite only applies to steps after the fork. Cached steps are served verbatim.
Try it
pip install rewind-agent[openai]
rewind demo # seed a session with a known failure
rewind fix latest # diagnose it
rewind fix latest --apply # fork + replay with the fix
No API keys needed for the demo. The diagnosis call uses your own OPENAI_API_KEY (gpt-4o-mini by default, ~$0.03 per diagnosis).
Full documentation: docs/fix.md