Agent Replay
Step through AI agent execution traces from Langfuse — inspect observation trees, review inputs/outputs, and submit verdicts for continuous improvement.
Why Agent Replay?
Automated metrics tell you an agent succeeded or failed. Agent Replay lets you see exactly how it got there — every LLM call, tool invocation, and decision branch in a hierarchical trace. Use it to debug failures, validate reasoning, and build golden datasets for evaluation.
Trace Inspection
Browse full execution traces with nested observation trees. Expand any node to see its input, output, and metadata.
Step-by-Step Debugging
Walk through spans, generations, tool calls, and events in order. Smart content rendering detects chat messages, JSON, and structured output.
Performance Metrics
Latency, token usage, and model info at every node. Trace-level KPIs show total cost, step count, and duration at a glance.
Human Review
Submit verdicts (positive/neutral/negative), identify failure steps, record rationale, and push approved traces to Langfuse golden datasets.
Quick Start
Get from zero to reviewing a trace in under a minute:
Configure Langfuse
Set LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and AGENT_REPLAY_ENABLED=true as environment variables. For multiple agents, use LANGFUSE_{AGENT}_PUBLIC_KEY per agent.
Search for a Trace
Select an agent from the tab bar, then enter a Trace ID (UUID) or use Field Search (e.g. Case Reference) to find traces by business identifiers.
Explore the Observation Tree
Click any node to inspect its input/output. Expand children to drill into nested spans and generations. Use Workflow I/O to see the trace-level request and response.
Submit a Review
Open the Review panel, select a verdict, optionally identify the failure step, and add rationale. Toggle Add to Golden Dataset to push the trace to Langfuse for future evaluations.
Page Anatomy
The Agent Replay page is divided into five functional zones:
Searching for Traces
Agent Replay supports two search modes, selected from the dropdown next to the search bar:
Trace ID Search
Paste a UUID or 32+ character hex string directly. The system auto-detects the format and fetches the trace immediately from Langfuse. This is the fastest way to investigate a specific execution.
Field Search
When a PostgreSQL search database is configured, you can search by business identifiers like Case Reference, Ticket Number, or any custom column. The dropdown shows available fields for the selected agent.
| Search Mode | Input | Requires | How It Works |
|---|---|---|---|
| Trace ID | UUID or hex string | Langfuse credentials only | Direct API lookup — instant result |
| Field Search | Business identifier | Search DB configured | PostgreSQL lookup → trace ID → Langfuse fetch |
config/agent_replay_db.yaml. Without it, only Trace ID search is available. See the Configuration section below.
Search results appear as trace cards showing the agent name, step count, timestamp, tags, and a trace ID snippet. Click any card to load the full trace.
Observation Tree
The observation tree is the core navigation element. It displays the hierarchical structure of a trace — every span, LLM generation, tool call, and event as nested nodes.
Node Types
| Type | Badge | Description | Example |
|---|---|---|---|
| SPAN | SPAN | Logical grouping or timing boundary | workflow, retrieval-chain |
| GENERATION | GEN | LLM call with prompt and completion | classify-intent, generate-response |
| TOOL | TOOL | External tool or function call | search_knowledge_base, fetch_policy |
| EVENT | EVT | Status or progress marker | task_complete, error_caught |
Tree Navigation
Each node shows its type badge, name, and inline metadata (latency, token count). Click a node to select it and view its details in the Node Detail Panel. Click the expand/collapse toggle to show or hide children.
Use the toolbar buttons Expand All and Collapse All to quickly open or close the entire tree. The special Workflow I/O node at the top of the tree shows the trace-level input and output.
Node Detail Panel
When you select a node in the observation tree, the Node Detail Panel shows its full content in a split-pane layout:
- Input pane (blue header, 35% width) — Shows the prompt or input data. For GENERATION nodes, the PromptViewer auto-detects chat message arrays and renders them as styled bubbles with role labels (system, user, assistant, tool).
- Output pane (green header, 65% width) — Shows the completion or result. The OutputViewer detects structured objects and renders them as color-coded section cards with copy buttons.
approved — within standard limits
Coverage of $500K is within the standard commercial property limit for this risk class.
1. SOP-4.2.1 — Standard limits table
2. KB-0891 — Risk classification guide
Metadata Drawer
Below the split pane, a collapsible drawer reveals additional details:
- Node info grid — Type, model name, latency, token counts, start/end timestamps
- Metadata table — Key-value pairs from the observation metadata (sensitive keys like
api_keyorsecretare automatically redacted)
Review & Verdicts
The Review Panel lets you record structured feedback for any trace. Reviews are persisted as Langfuse scores and can optionally create golden dataset items.
Review Fields
| Field | Required | Langfuse Score Name | Description |
|---|---|---|---|
| Verdict | Yes | review_verdict |
Positive (+1), Neutral (0), or Negative (-1) |
| Failure Step | No | review_failure_step |
Observation node where the agent went wrong |
| Tooling Needs | No | review_tooling_needs |
What tools/capabilities the agent lacked |
| Rationale | No | review_rationale |
Explanation of the verdict decision |
| Expected Output | No | review_expected_output |
Ground truth the agent should have produced |
| Golden Dataset | No | — | Push trace as a Langfuse dataset item for evaluation |
{agent}-golden-{YYYY-MM}. You can type a custom name or select from existing datasets in the dropdown.
Configuration
Agent Replay requires Langfuse credentials and an optional search database. All configuration follows the standard AXIS pattern: YAML → environment variables → hardcoded defaults.
Langfuse Credentials
Set credentials globally or per agent:
# Global (all agents share one Langfuse project)
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com # optional
# Per-agent (each agent has its own project)
LANGFUSE_ALPHA_BOT_PUBLIC_KEY=pk-lf-...
LANGFUSE_ALPHA_BOT_SECRET_KEY=sk-lf-...
LANGFUSE_REVIEWER_PUBLIC_KEY=pk-lf-...
LANGFUSE_REVIEWER_SECRET_KEY=sk-lf-...
Enable the feature with:
AGENT_REPLAY_ENABLED=true
Search Database (Optional)
To enable field search, create config/agent_replay_db.yaml from the example template:
# config/agent_replay_db.yaml
enabled: true
host: localhost
port: 5432
database: agent_traces
username: axis_reader
password: ${REPLAY_DB_PASSWORD}
schema: public
table: trace_index
search_column: case_reference
search_column_label: Case Reference
trace_id_column: langfuse_trace_id
# Per-agent overrides (optional)
agents:
alpha_bot:
table: alpha_bot_traces
search_column: ticket_number
beta_bot:
table: beta_bot_traces
Tuning Parameters
| Parameter | Default | Description |
|---|---|---|
default_limit |
20 | Maximum traces returned per search |
default_days_back |
7 | Default time window for trace listing |
max_chars |
50,000 | Content truncation limit per node |
search_metadata_key |
caseReference | Langfuse metadata field for fallback search |
query_timeout |
10s | Search DB query timeout |
pool_max_size |
5 | Connection pool maximum for search DB |
Keyboard Shortcuts
| Key | Action | Context |
|---|---|---|
| ↑ / ↓ | Navigate between visible tree nodes | Observation Tree |
| ← | Collapse current node (or move to parent) | Observation Tree |
| → | Expand current node (or move to first child) | Observation Tree |
| Enter | Select focused node and load its detail | Observation Tree |
| Esc | Close Review Panel | Review Panel |