Rewind

The difference

Other tools show what happened.
Rewind lets you change it.

Langfuse / LangSmith / Helicone

View traces after the run
Re-run the entire agent to test a fix
Manual diagnosis — read logs, guess what went wrong
No timeline branching
Manual before/after comparison
Separate tools for tracing vs. evals

View traces, then change them
Fork at the failure, replay just the fix
AI diagnosis — rewind fix finds the root cause for you
Branch at any step, cached at 0 tokens
LLM-as-judge scores diffs automatically
Tracing, debugging, and evals in one model

Local-first · open source

Your agents. Your data.
Your machine.

Every competing tool requires a cloud account, an API key, and trusting a third party with your prompts. Rewind runs entirely on your laptop.

Your data never leaves your machine

Every prompt, response, and context window stays in a local SQLite database. No cloud account. No telemetry. No data processing agreements to sign.

Fully open source, MIT licensed

Read every line. Fork it. Audit it. Contribute to it. No vendor lock-in, no license gating, no feature paywalls. The code is the product.

Single binary, zero dependencies

One statically-linked Rust binary. No Docker, no Postgres, no Redis, no cloud account. Works offline, on air-gapped networks, behind any firewall.

Free forever, no usage limits

No per-seat pricing. No eval run quotas. No metered API calls. Record as many sessions as your disk can hold. The infrastructure cost is zero.

Rewind vs. cloud platforms

	Rewind	Others
Data residency	Your machine	Their cloud
API key for infra	Not needed	Required
Works offline	Yes	No
Source code	MIT on GitHub	Closed
Cost	Free	$49–199+/mo
Dependencies	Zero	Cloud + DB

Already using Langfuse or LangSmith? You don't have to choose. Import their traces into Rewind for local debugging, or export Rewind sessions back to your team dashboard.

3 simple steps

From first install to fixed agent in minutes

One line of code. Zero config. No API keys needed.

agent.py

import rewind_agent

import openai

rewind_agent.init() # that's it

client = openai.OpenAI()

client.chat.completions.create(...)

1

Record

Add one line to your Python agent. Every LLM call, tool invocation, and context window is captured automatically. Or use the HTTP proxy for any language.

rewind inspect latest

Session: support-agent Steps: 8

✓ 🧠 gpt-4o 320ms 156↓

✓ 🔧 web_search 45ms

✓ 🧠 gpt-4o 650ms 280↓

✗ 🧠 gpt-4o ERROR

2

Inspect

See the exact context window at every step. Every message, system prompt, and tool response the model saw. TUI, Web Dashboard, or CLI.

rewind fix latest

⏪ Diagnosing "my-agent" (12 steps)

✗ Step 8 hallucination

Root cause: stale data from rate-limited API

Fix: retry_step high confidence

→ rewind fix latest --apply

3

Fix

rewind fix diagnoses the failure with AI, suggests a fix, and optionally forks + replays with the patch applied. One command from broken to proven.

Chrome DevTools for AI agents

Record, inspect, fork, replay, diff, evaluate. All on the same timeline. The only tool where debugging, tracing, and evals share the same data model.

Fork & Replay

Branch at any step. Fix your code, replay from failure. Steps before the fork are cached (0 tokens, 0ms). No other tool does this.

Rewind activity timeline showing multi-agent swim lanes with duration bars and step detail

AI Diagnosis & Repair

One command finds the root cause, suggests a fix (model swap, system prompt, temperature, retry), and verifies it works via fork + replay. Powered by rewind fix.

Rewind timeline diff showing original vs fixed fork with divergence point

Prove the Fix

Score original vs. forked timelines with LLM-as-judge. Correctness, coherence, safety scored automatically. Ship with evidence, not hope.

Rewind error detail view showing evaluation scores across timelines

Session Sharing

Generate a self-contained HTML file. Open in any browser, share via Slack or email. No install, no login, works offline.

Rewind self-contained HTML share viewer open in a browser

Instant Replay

Identical requests served from cache at 0 tokens, 0ms. Run the same agent 10 times. Only the first hits the LLM.

Import & Debug

Import traces from Langfuse, Datadog, or any OTel backend. Fork at the failure, replay locally, export the fix back.

Regression Testing

Turn any session into a baseline. Check step types, models, tool calls, and token drift. 3-line GitHub Action.

Snapshots

Checkpoint your workspace before an agent runs. Restore in one command if it breaks something. No git required.

Web Dashboard

Browser-based session explorer with activity timeline (swim lanes), step list, context viewer, multi-metric axis, visual diff, and eval dashboard — all embedded in the binary.

SQL Explorer

Run ad-hoc SQL against the Rewind database. Token usage by model, cost estimation, session analytics. Read-only, safe to explore.

Multi-agent tracing

See every agent, tool call, and handoff

Hierarchical span tree and activity timeline for multi-agent workflows. Each agent gets its own swim lane with duration bars. Auto-captures agent boundaries and handoffs from OpenAI Agents SDK. Thread view for multi-turn conversations.

Automatic agent boundaries

OpenAI Agents SDK and Pydantic AI detected on init()

Manual @span() decorator

Group any code into named spans for custom agent hierarchies

Thread view

Multi-turn conversations grouped by thread with agent attribution

rewind inspect latest --spans

▼ ✓ 🤖 supervisor 1.2s

✓ 🧠 gpt-4o "Route to researcher" 320ms

▼ ✓ 🤖 researcher 2.1s

✓ 🧠 gpt-4o "Search" 890ms

✓ 🔧 web_search 45ms

✓ 🧠 gpt-4o "Synthesize" 650ms

✓ 🔀 handoff: researcher → writer

▼ ✗ 🤖 writer 1.8s

✓ 🧠 gpt-4o "Draft article" 1.2s

✗ 🧠 gpt-4o hallucination 600ms

IDE integration

Every Claude Code session. Automatically captured.

One command installs hooks. Every prompt, tool call, and file edit is recorded. No code changes needed. Also works with Cursor and Windsurf via MCP.

Integration flow

Claude Code / Cursor / Windsurf

Your IDE session runs normally

Hooks + MCP Server

Automatic capture, zero instrumentation

Session lifecycle

Tool calls

User prompts

File edits

Subagent trees

Token usage

Rewind Dashboard

Inspect, fork, replay, diff, evaluate

26 MCP Tools

Debug without leaving your editor

Ask your AI assistant to inspect a failed session, diff two timelines, or run an eval suite. All from within your IDE, no context switching.

Query recordings

View span trees

Browse threads

Diff timelines

Create baselines

Run evaluations

terminal

# One-command setup
$ rewind hooks install

# Start the dashboard
$ rewind web --port 4800

# Sessions appear automatically
# at http://127.0.0.1:4800

26

MCP tools

0

Config needed

3

IDEs supported

Evaluation platform

Ship with confidence. Catch regressions in CI.

Create datasets, run your agent against them, score with 7 evaluator types, compare experiments side-by-side. Block merges when quality drops.

rewind eval

Datasets Experiments Compare

booking-tests v2

12 examples · 3 evaluators · 2m ago

95%

Example Exact Schema Judge

Book a table for 2 ✓ ✓ 0.95

Cancel reservation ✓ ✓ 0.92

Change date to Friday ✓ ● 0.78

Refund my order ✗ ✓ 0.41

Score: 95.0% | Threshold: 90% | PASS

7 Evaluator Types

Exact match deterministic

Contains deterministic

Regex deterministic

JSON schema deterministic

Tool use match deterministic

Custom custom

LLM-as-judge ai

Built for CI/CD

CI-ready with --fail-below thresholds
GitHub Action for regression testing
Side-by-side experiment comparison
Per-example regression detection
JSON output for dashboard ingestion
Versioned datasets with deduplication

test_agent.py

result = rewind_agent.evaluate(
  dataset="booking-tests",
  target_fn=my_agent,
  evaluators=["exact_match", llm_judge],
  fail_below=0.9,
)

See it in action

A full debug cycle in under 60 seconds. Record, inspect, fork, diff.

rewind

No API keys

The demo runs with built-in sample data. No OpenAI key, no cloud account, nothing to configure.

Cached steps = free

Watch the replay counter. Steps before the fork point return instantly from cache. Zero tokens, zero cost.

Everything local

All data in a SQLite database on your machine. The entire tool is one binary. Nothing phones home.

Try it yourself, no API keys needed:


rewind demo && rewind inspect latest

Works with your stack

Already using Langfuse, LangSmith, or Datadog? You don't have to choose. Rewind works alongside them.

LLM Providers

OpenAI GPT-4o, o1, o3

streaming + non-streaming

Anthropic Claude 3.5/4

streaming + non-streaming

AWS Bedrock

non-streaming

Any OpenAI-compatible Ollama, vLLM, LiteLLM

streaming + non-streaming

Agent Frameworks

OpenAI Agents SDK

native

Pydantic AI

native

LangGraph

wrapper

CrewAI

wrapper

Any framework via HTTP proxy proxy

Autogen, smolagents, custom code, any language

Observability stack

Langfuse LangSmith Datadog Grafana Tempo Jaeger

Import traces in, export sessions out, or dual-ship to both.

Start debugging in 30 seconds

One install. One line of code. Zero config.

terminal

$ pip install rewind-agent

# Add one line to your agent:
import rewind_agent
rewind_agent.init()

# Run your agent, then inspect the session:
$ rewind show latest

# Or try the built-in demo (no API keys):
$ rewind demo && rewind inspect latest

Star on GitHub Read the docs →

Agent broke at step 30? Fix step 30.

Other tools show what happened. Rewind lets you change it.

Langfuse / LangSmith / Helicone

Rewind

Your agents. Your data. Your machine.

Your data never leaves your machine

Fully open source, MIT licensed

Single binary, zero dependencies

Free forever, no usage limits

Rewind vs. cloud platforms

From first install to fixed agent in minutes

Record

Inspect

Fix

Chrome DevTools for AI agents

Fork & Replay

AI Diagnosis & Repair

Prove the Fix

Session Sharing

Instant Replay

Import & Debug

Regression Testing

Snapshots

Web Dashboard

SQL Explorer

See every agent, tool call, and handoff

Every Claude Code session. Automatically captured.

Integration flow

26 MCP Tools

Ship with confidence. Catch regressions in CI.

7 Evaluator Types

Built for CI/CD

See it in action

Works with your stack

LLM Providers

Agent Frameworks

Observability stack

Start debugging in 30 seconds

Agent broke at step 30?
Fix step 30.

Other tools show what happened.
Rewind lets you change it.

Your agents. Your data.
Your machine.