Agent Replay

Step through AI agent execution traces from Langfuse — inspect observation trees, review inputs/outputs, and submit verdicts for continuous improvement.

Why Agent Replay?

Automated metrics tell you an agent succeeded or failed. Agent Replay lets you see exactly how it got there — every LLM call, tool invocation, and decision branch in a hierarchical trace. Use it to debug failures, validate reasoning, and build golden datasets for evaluation.

🔍 Trace Inspection

Browse full execution traces with nested observation trees. Expand any node to see its input, output, and metadata.

🧩 Step-by-Step Debugging

Walk through spans, generations, tool calls, and events in order. Smart content rendering detects chat messages, JSON, and structured output.

⚡ Performance Metrics

Latency, token usage, and model info at every node. Trace-level KPIs show total cost, step count, and duration at a glance.

✅ Human Review

Submit verdicts (positive/neutral/negative), identify failure steps, record rationale, and push approved traces to Langfuse golden datasets.

Quick Start

Get from zero to reviewing a trace in under a minute:

Configure Langfuse

Set LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and AGENT_REPLAY_ENABLED=true as environment variables. For multiple agents, use LANGFUSE_{AGENT}_PUBLIC_KEY per agent.

Search for a Trace

Select an agent from the tab bar, then enter a Trace ID (UUID) or use Field Search (e.g. Case Reference) to find traces by business identifiers.

Explore the Observation Tree

Click any node to inspect its input/output. Expand children to drill into nested spans and generations. Use Workflow I/O to see the trace-level request and response.

Submit a Review

Open the Review panel, select a verdict, optionally identify the failure step, and add rationale. Toggle Add to Golden Dataset to push the trace to Langfuse for future evaluations.

💡 Tip

If you already have a trace ID from Langfuse or your application logs, paste it directly into the search bar for instant lookup — no field search configuration needed.

Page Anatomy

The Agent Replay page is divided into five functional zones:

AXIS — Agent Replay

Alpha Bot Reviewer

12 nodes 3.2s 4.8K tokens a1b2c3d4

Trace ID

Enter trace ID or search value...

Observation Tree

▼ SPAN workflow

▼ GEN classify

▶ TOOL search_kb

EVT complete

INPUT

OUTPUT

user

What is the service limit for...

decision: approved

rationale: Coverage within...

Review

👍 ➖ 👎

RATIONALE

Agent followed SOP correctly...

DATASET

golden-dataset-2026-02

Save Review

Agent Replay page layout with all five zones

Agent Identity Bar — Agent tabs with avatars. When a trace is loaded, shows KPI chips: node count, latency, token usage, and trace ID prefix.

Trace Search — Search by Trace ID (UUID paste) or Field Search (e.g., Case Reference) via the dropdown. Results appear as trace cards below.

Observation Tree — Hierarchical view of all trace observations. Color-coded by type (SPAN, GENERATION, TOOL, EVENT). Click to select, expand/collapse children.

Node Detail Panel — Split pane showing input (blue header) and output (green header) for the selected node. Smart rendering for chat messages, JSON, and structured data.

Review Panel — Verdict selector, failure step dropdown, rationale/expected output fields, and golden dataset push toggle.

Searching for Traces

Agent Replay supports two search modes, selected from the dropdown next to the search bar:

Trace ID Search

Paste a UUID or 32+ character hex string directly. The system auto-detects the format and fetches the trace immediately from Langfuse. This is the fastest way to investigate a specific execution.

Field Search

When a PostgreSQL search database is configured, you can search by business identifiers like Case Reference, Ticket Number, or any custom column. The dropdown shows available fields for the selected agent.

Search Mode	Input	Requires	How It Works
Trace ID	UUID or hex string	Langfuse credentials only	Direct API lookup — instant result
Field Search	Business identifier	Search DB configured	PostgreSQL lookup → trace ID → Langfuse fetch

ℹ️ Info

Field search requires a PostgreSQL database connection configured in config/agent_replay_db.yaml. Without it, only Trace ID search is available. See the Configuration section below.

Search results appear as trace cards showing the agent name, step count, timestamp, tags, and a trace ID snippet. Click any card to load the full trace.

Observation Tree

The observation tree is the core navigation element. It displays the hierarchical structure of a trace — every span, LLM generation, tool call, and event as nested nodes.

Node Types

Type	Badge	Description	Example
SPAN	SPAN	Logical grouping or timing boundary	`workflow`, `retrieval-chain`
GENERATION	GEN	LLM call with prompt and completion	`classify-intent`, `generate-response`
TOOL	TOOL	External tool or function call	`search_knowledge_base`, `fetch_policy`
EVENT	EVT	Status or progress marker	`task_complete`, `error_caught`

Each node shows its type badge, name, and inline metadata (latency, token count). Click a node to select it and view its details in the Node Detail Panel. Click the expand/collapse toggle to show or hide children.

Use the toolbar buttons Expand All and Collapse All to quickly open or close the entire tree. The special Workflow I/O node at the top of the tree shows the trace-level input and output.

Observation Tree — Expanded

Observation Tree

Expand All Collapse All

◆ Workflow I/O

▼ SPAN support-workflow 3.2s

▼ GEN classify-intent 890ms · 1.2K

EVT classification_result

▶ TOOL search_knowledge_base 420ms

▼ GEN generate-decision 1.8s · 2.4K

TOOL format_response 12ms

Observation tree with nested spans, generations, tools, and events

Node Detail Panel

When you select a node in the observation tree, the Node Detail Panel shows its full content in a split-pane layout:

Input pane (blue header, 35% width) — Shows the prompt or input data. For GENERATION nodes, the PromptViewer auto-detects chat message arrays and renders them as styled bubbles with role labels (system, user, assistant, tool).
Output pane (green header, 65% width) — Shows the completion or result. The OutputViewer detects structured objects and renders them as color-coded section cards with copy buttons.

Input

SYS

You are a customer support assistant. Follow the SOP for all decisions...

What is the service limit for account ACC-2024-0891?

Output

decision
approved — within standard limits

rationale
Coverage of $500K is within the standard commercial property limit for this risk class.

citations
1. SOP-4.2.1 — Standard limits table
2. KB-0891 — Risk classification guide

Metadata Drawer

Below the split pane, a collapsible drawer reveals additional details:

Node info grid — Type, model name, latency, token counts, start/end timestamps
Metadata table — Key-value pairs from the observation metadata (sensitive keys like api_key or secret are automatically redacted)

📝 Note

Large content is truncated at 50,000 characters by default. Click "Show full" to fetch the complete content on demand (soft-capped at 500K characters).

Review & Verdicts

The Review Panel lets you record structured feedback for any trace. Reviews are persisted as Langfuse scores and can optionally create golden dataset items.

Review Trace

Verdict

👍 Positive

➖ Neutral

👎 Negative

Failure Step

Select observation node...

Tooling Needs

Rationale

Expected Output

Add to Golden Dataset

golden-dataset-2026-02

Save Review

Review Fields

Field	Required	Langfuse Score Name	Description
Verdict	Yes	`review_verdict`	Positive (+1), Neutral (0), or Negative (-1)
Failure Step	No	`review_failure_step`	Observation node where the agent went wrong
Tooling Needs	No	`review_tooling_needs`	What tools/capabilities the agent lacked
Rationale	No	`review_rationale`	Explanation of the verdict decision
Expected Output	No	`review_expected_output`	Ground truth the agent should have produced
Golden Dataset	No	—	Push trace as a Langfuse dataset item for evaluation

💡 Tip

When you enable Add to Golden Dataset, the default dataset name is {agent}-golden-{YYYY-MM}. You can type a custom name or select from existing datasets in the dropdown.

Configuration

Agent Replay requires Langfuse credentials and an optional search database. All configuration follows the standard AXIS pattern: YAML → environment variables → hardcoded defaults.

Langfuse Credentials

Set credentials globally or per agent:

# Global (all agents share one Langfuse project)
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com   # optional

# Per-agent (each agent has its own project)
LANGFUSE_ALPHA_BOT_PUBLIC_KEY=pk-lf-...
LANGFUSE_ALPHA_BOT_SECRET_KEY=sk-lf-...
LANGFUSE_REVIEWER_PUBLIC_KEY=pk-lf-...
LANGFUSE_REVIEWER_SECRET_KEY=sk-lf-...

Enable the feature with:

AGENT_REPLAY_ENABLED=true

⚠️ Warning

Langfuse API keys are sensitive. Never commit them to version control. Use environment variables or a secrets manager in production.

Search Database (Optional)

To enable field search, create config/agent_replay_db.yaml from the example template:

# config/agent_replay_db.yaml
enabled: true
host: localhost
port: 5432
database: agent_traces
username: axis_reader
password: ${REPLAY_DB_PASSWORD}
schema: public
table: trace_index
search_column: case_reference
search_column_label: Case Reference
trace_id_column: langfuse_trace_id

# Per-agent overrides (optional)
agents:
  alpha_bot:
    table: alpha_bot_traces
    search_column: ticket_number
  beta_bot:
    table: beta_bot_traces

Tuning Parameters

Parameter	Default	Description
`default_limit`	20	Maximum traces returned per search
`default_days_back`	7	Default time window for trace listing
`max_chars`	50,000	Content truncation limit per node
`search_metadata_key`	caseReference	Langfuse metadata field for fallback search
`query_timeout`	10s	Search DB query timeout
`pool_max_size`	5	Connection pool maximum for search DB

Keyboard Shortcuts

Key	Action	Context
`↑` / `↓`	Navigate between visible tree nodes	Observation Tree
`←`	Collapse current node (or move to parent)	Observation Tree
`→`	Expand current node (or move to first child)	Observation Tree
`Enter`	Select focused node and load its detail	Observation Tree
`Esc`	Close Review Panel	Review Panel

Agent Replay

Why Agent Replay?

🔍 Trace Inspection

🧩 Step-by-Step Debugging

⚡ Performance Metrics

✅ Human Review

Quick Start

Configure Langfuse

Search for a Trace

Explore the Observation Tree

Submit a Review

Page Anatomy

Searching for Traces

Trace ID Search

Field Search

Observation Tree

Node Types

Tree Navigation

Node Detail Panel

Metadata Drawer

Review & Verdicts

Review Fields

Configuration

Langfuse Credentials

Search Database (Optional)

Tuning Parameters

Keyboard Shortcuts

Next Steps

📊 Monitoring →

🛡 Production →

🧪 Evaluate →