Skip to content
Open source · MIT licensed · Single binary

Agent broke at step 30?
Fix step 30.

Not steps 1 through 29 again. Fork at the failure, replay with the fix, prove it works – all on your machine, never in someone else's cloud.

pip install rewind-agent

Then run rewind demo && rewind inspect latest – no API keys needed.

The difference

Other tools show what happened. Rewind lets you change it.

Langfuse / LangSmith / Helicone

  • View traces after the run
  • Re-run the entire agent to test a fix
  • Manual diagnosis — read logs, guess what went wrong
  • No timeline branching
  • Manual before/after comparison
  • Separate tools for tracing vs. evals

Rewind

  • View traces, then change them
  • Fork at the failure, replay just the fix
  • AI diagnosis — rewind fix finds the root cause for you
  • Branch at any step, cached at 0 tokens
  • LLM-as-judge scores diffs automatically
  • Tracing, debugging, and evals in one model
Local-first · open source

Your agents. Your data. Your machine.

Every competing tool requires a cloud account, an API key, and trusting a third party with your prompts. Rewind runs entirely on your laptop.

Your data never leaves your machine

Every prompt, response, and context window stays in a local SQLite database. No cloud account. No telemetry. No data processing agreements to sign.

Fully open source, MIT licensed

Read every line. Fork it. Audit it. Contribute to it. No vendor lock-in, no license gating, no feature paywalls. The code is the product.

Single binary, zero dependencies

One statically-linked Rust binary. No Docker, no Postgres, no Redis, no cloud account. Works offline, on air-gapped networks, behind any firewall.

Free forever, no usage limits

No per-seat pricing. No eval run quotas. No metered API calls. Record as many sessions as your disk can hold. The infrastructure cost is zero.

Rewind vs. cloud platforms

Rewind Others
Data residency Your machine Their cloud
API key for infra Not needed Required
Works offline Yes No
Source code MIT on GitHub Closed
Cost Free $49–199+/mo
Dependencies Zero Cloud + DB

Already using Langfuse or LangSmith? You don't have to choose. Import their traces into Rewind for local debugging, or export Rewind sessions back to your team dashboard.

3 simple steps

From first install to fixed agent in minutes

One line of code. Zero config. No API keys needed.

agent.py
import rewind_agent
import openai
rewind_agent.init() # that's it
client = openai.OpenAI()
client.chat.completions.create(...)
1

Record

Add one line to your Python agent. Every LLM call, tool invocation, and context window is captured automatically. Or use the HTTP proxy for any language.

rewind inspect latest
Session: support-agent Steps: 8
🧠 gpt-4o 320ms 156↓
🔧 web_search 45ms
🧠 gpt-4o 650ms 280↓
🧠 gpt-4o ERROR
2

Inspect

See the exact context window at every step. Every message, system prompt, and tool response the model saw. TUI, Web Dashboard, or CLI.

rewind fix latest
⏪ Diagnosing "my-agent" (12 steps)
Step 8 hallucination
Root cause: stale data from rate-limited API
Fix: retry_step high confidence
→ rewind fix latest --apply
3

Fix

rewind fix diagnoses the failure with AI, suggests a fix, and optionally forks + replays with the patch applied. One command from broken to proven.

Chrome DevTools for AI agents

Record, inspect, fork, replay, diff, evaluate. All on the same timeline. The only tool where debugging, tracing, and evals share the same data model.

Fork & Replay

Branch at any step. Fix your code, replay from failure. Steps before the fork are cached (0 tokens, 0ms). No other tool does this.

Rewind activity timeline showing multi-agent swim lanes with duration bars and step detail

AI Diagnosis & Repair

One command finds the root cause, suggests a fix (model swap, system prompt, temperature, retry), and verifies it works via fork + replay. Powered by rewind fix.

Rewind timeline diff showing original vs fixed fork with divergence point

Prove the Fix

Score original vs. forked timelines with LLM-as-judge. Correctness, coherence, safety scored automatically. Ship with evidence, not hope.

Rewind error detail view showing evaluation scores across timelines

Session Sharing

Generate a self-contained HTML file. Open in any browser, share via Slack or email. No install, no login, works offline.

Rewind self-contained HTML share viewer open in a browser

Instant Replay

Identical requests served from cache at 0 tokens, 0ms. Run the same agent 10 times. Only the first hits the LLM.

Import & Debug

Import traces from Langfuse, Datadog, or any OTel backend. Fork at the failure, replay locally, export the fix back.

Regression Testing

Turn any session into a baseline. Check step types, models, tool calls, and token drift. 3-line GitHub Action.

Snapshots

Checkpoint your workspace before an agent runs. Restore in one command if it breaks something. No git required.

Web Dashboard

Browser-based session explorer with activity timeline (swim lanes), step list, context viewer, multi-metric axis, visual diff, and eval dashboard — all embedded in the binary.

SQL Explorer

Run ad-hoc SQL against the Rewind database. Token usage by model, cost estimation, session analytics. Read-only, safe to explore.

Multi-agent tracing

See every agent, tool call, and handoff

Hierarchical span tree and activity timeline for multi-agent workflows. Each agent gets its own swim lane with duration bars. Auto-captures agent boundaries and handoffs from OpenAI Agents SDK. Thread view for multi-turn conversations.

Automatic agent boundaries
OpenAI Agents SDK and Pydantic AI detected on init()
Manual @span() decorator
Group any code into named spans for custom agent hierarchies
Thread view
Multi-turn conversations grouped by thread with agent attribution
rewind inspect latest --spans
IDE integration

Every Claude Code session. Automatically captured.

One command installs hooks. Every prompt, tool call, and file edit is recorded. No code changes needed. Also works with Cursor and Windsurf via MCP.

Integration flow

Claude Code / Cursor / Windsurf
Your IDE session runs normally
Hooks + MCP Server
Automatic capture, zero instrumentation
Session lifecycle
Tool calls
User prompts
File edits
Subagent trees
Token usage
Rewind Dashboard
Inspect, fork, replay, diff, evaluate

26 MCP Tools

Debug without leaving your editor

Ask your AI assistant to inspect a failed session, diff two timelines, or run an eval suite. All from within your IDE, no context switching.

Query recordings
View span trees
Browse threads
Diff timelines
Create baselines
Run evaluations
terminal
# One-command setup
$ rewind hooks install

# Start the dashboard
$ rewind web --port 4800

# Sessions appear automatically
# at http://127.0.0.1:4800
26
MCP tools
0
Config needed
3
IDEs supported
Evaluation platform

Ship with confidence. Catch regressions in CI.

Create datasets, run your agent against them, score with 7 evaluator types, compare experiments side-by-side. Block merges when quality drops.

rewind eval
Datasets Experiments Compare
booking-tests v2
12 examples · 3 evaluators · 2m ago
95%
Example Exact Schema Judge
Book a table for 2 0.95
Cancel reservation 0.92
Change date to Friday 0.78
Refund my order 0.41
Score: 95.0% | Threshold: 90% | PASS

7 Evaluator Types

Exact match deterministic
Contains deterministic
Regex deterministic
JSON schema deterministic
Tool use match deterministic
Custom custom
LLM-as-judge ai

Built for CI/CD

  • CI-ready with --fail-below thresholds
  • GitHub Action for regression testing
  • Side-by-side experiment comparison
  • Per-example regression detection
  • JSON output for dashboard ingestion
  • Versioned datasets with deduplication
test_agent.py
result = rewind_agent.evaluate(
  dataset="booking-tests",
  target_fn=my_agent,
  evaluators=["exact_match", llm_judge],
  fail_below=0.9,
)

See it in action

A full debug cycle in under 60 seconds. Record, inspect, fork, diff.

rewind
No API keys

The demo runs with built-in sample data. No OpenAI key, no cloud account, nothing to configure.

Cached steps = free

Watch the replay counter. Steps before the fork point return instantly from cache. Zero tokens, zero cost.

Everything local

All data in a SQLite database on your machine. The entire tool is one binary. Nothing phones home.

Try it yourself, no API keys needed:

rewind demo && rewind inspect latest

Works with your stack

Already using Langfuse, LangSmith, or Datadog? You don't have to choose. Rewind works alongside them.

LLM Providers

OpenAI GPT-4o, o1, o3
streaming + non-streaming
Anthropic Claude 3.5/4
streaming + non-streaming
AWS Bedrock
non-streaming
Any OpenAI-compatible Ollama, vLLM, LiteLLM
streaming + non-streaming

Agent Frameworks

Supported (native): OpenAI Agents SDK
native
Supported (native): Pydantic AI
native
Supported (wrapper): LangGraph
wrapper
Supported (wrapper): CrewAI
wrapper
Any framework via HTTP proxy proxy

Autogen, smolagents, custom code, any language

Observability stack

Langfuse LangSmith Datadog Grafana Tempo Jaeger

Import traces in, export sessions out, or dual-ship to both.

Start debugging in 30 seconds

One install. One line of code. Zero config.

terminal
$ pip install rewind-agent

# Add one line to your agent:
import rewind_agent
rewind_agent.init()

# Run your agent, then inspect the session:
$ rewind show latest

# Or try the built-in demo (no API keys):
$ rewind demo && rewind inspect latest