Agent Replay

Step through AI agent execution traces from Langfuse — inspect observation trees, review inputs/outputs, and submit verdicts for continuous improvement.

Why Agent Replay?

Automated metrics tell you an agent succeeded or failed. Agent Replay lets you see exactly how it got there — every LLM call, tool invocation, and decision branch in a hierarchical trace. Use it to debug failures, validate reasoning, and build golden datasets for evaluation.

🔍 Trace Inspection

Browse full execution traces with nested observation trees. Expand any node to see its input, output, and metadata.

🧩 Step-by-Step Debugging

Walk through spans, generations, tool calls, and events in order. Smart content rendering detects chat messages, JSON, and structured output.

Performance Metrics

Latency, token usage, and model info at every node. Trace-level KPIs show total cost, step count, and duration at a glance.

Human Review

Submit verdicts (positive/neutral/negative), identify failure steps, record rationale, and push approved traces to Langfuse golden datasets.

Quick Start

Get from zero to reviewing a trace in under a minute:

1

Configure Langfuse

Set LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and AGENT_REPLAY_ENABLED=true as environment variables. For multiple agents, use LANGFUSE_{AGENT}_PUBLIC_KEY per agent.

2

Search for a Trace

Select an agent from the tab bar, then enter a Trace ID (UUID) or use Field Search (e.g. Case Reference) to find traces by business identifiers.

3

Explore the Observation Tree

Click any node to inspect its input/output. Expand children to drill into nested spans and generations. Use Workflow I/O to see the trace-level request and response.

4

Submit a Review

Open the Review panel, select a verdict, optionally identify the failure step, and add rationale. Toggle Add to Golden Dataset to push the trace to Langfuse for future evaluations.

💡 Tip
If you already have a trace ID from Langfuse or your application logs, paste it directly into the search bar for instant lookup — no field search configuration needed.

Page Anatomy

The Agent Replay page is divided into five functional zones:

AXIS — Agent Replay
Alpha Bot Reviewer
12 nodes 3.2s 4.8K tokens a1b2c3d4
1
Trace ID
Enter trace ID or search value...
Search
2
Observation Tree
SPAN workflow
GEN classify
TOOL search_kb
  EVT complete
3
INPUT
OUTPUT
user
What is the service limit for...
decision: approved
rationale: Coverage within...
4
Review
👍 👎
RATIONALE
Agent followed SOP correctly...
DATASET
golden-dataset-2026-02
Save Review
5
Agent Replay page layout with all five zones
1
Agent Identity Bar — Agent tabs with avatars. When a trace is loaded, shows KPI chips: node count, latency, token usage, and trace ID prefix.
2
Trace Search — Search by Trace ID (UUID paste) or Field Search (e.g., Case Reference) via the dropdown. Results appear as trace cards below.
3
Observation Tree — Hierarchical view of all trace observations. Color-coded by type (SPAN, GENERATION, TOOL, EVENT). Click to select, expand/collapse children.
4
Node Detail Panel — Split pane showing input (blue header) and output (green header) for the selected node. Smart rendering for chat messages, JSON, and structured data.
5
Review Panel — Verdict selector, failure step dropdown, rationale/expected output fields, and golden dataset push toggle.

Searching for Traces

Agent Replay supports two search modes, selected from the dropdown next to the search bar:

Paste a UUID or 32+ character hex string directly. The system auto-detects the format and fetches the trace immediately from Langfuse. This is the fastest way to investigate a specific execution.

When a PostgreSQL search database is configured, you can search by business identifiers like Case Reference, Ticket Number, or any custom column. The dropdown shows available fields for the selected agent.

Search Mode Input Requires How It Works
Trace ID UUID or hex string Langfuse credentials only Direct API lookup — instant result
Field Search Business identifier Search DB configured PostgreSQL lookup → trace ID → Langfuse fetch
ℹ️ Info
Field search requires a PostgreSQL database connection configured in config/agent_replay_db.yaml. Without it, only Trace ID search is available. See the Configuration section below.

Search results appear as trace cards showing the agent name, step count, timestamp, tags, and a trace ID snippet. Click any card to load the full trace.

Observation Tree

The observation tree is the core navigation element. It displays the hierarchical structure of a trace — every span, LLM generation, tool call, and event as nested nodes.

Node Types

Type Badge Description Example
SPAN SPAN Logical grouping or timing boundary workflow, retrieval-chain
GENERATION GEN LLM call with prompt and completion classify-intent, generate-response
TOOL TOOL External tool or function call search_knowledge_base, fetch_policy
EVENT EVT Status or progress marker task_complete, error_caught

Tree Navigation

Each node shows its type badge, name, and inline metadata (latency, token count). Click a node to select it and view its details in the Node Detail Panel. Click the expand/collapse toggle to show or hide children.

Use the toolbar buttons Expand All and Collapse All to quickly open or close the entire tree. The special Workflow I/O node at the top of the tree shows the trace-level input and output.

Observation Tree — Expanded
Observation Tree
Expand All Collapse All
Workflow I/O
SPAN support-workflow 3.2s
GEN classify-intent 890ms · 1.2K
  EVT classification_result
TOOL search_knowledge_base 420ms
GEN generate-decision 1.8s · 2.4K
  TOOL format_response 12ms
Observation tree with nested spans, generations, tools, and events

Node Detail Panel

When you select a node in the observation tree, the Node Detail Panel shows its full content in a split-pane layout:

  • Input pane (blue header, 35% width) — Shows the prompt or input data. For GENERATION nodes, the PromptViewer auto-detects chat message arrays and renders them as styled bubbles with role labels (system, user, assistant, tool).
  • Output pane (green header, 65% width) — Shows the completion or result. The OutputViewer detects structured objects and renders them as color-coded section cards with copy buttons.
Input
SYS
You are a customer support assistant. Follow the SOP for all decisions...
U
What is the service limit for account ACC-2024-0891?
Output
decision
approved — within standard limits
rationale
Coverage of $500K is within the standard commercial property limit for this risk class.
citations
1. SOP-4.2.1 — Standard limits table
2. KB-0891 — Risk classification guide

Metadata Drawer

Below the split pane, a collapsible drawer reveals additional details:

  • Node info grid — Type, model name, latency, token counts, start/end timestamps
  • Metadata table — Key-value pairs from the observation metadata (sensitive keys like api_key or secret are automatically redacted)
📝 Note
Large content is truncated at 50,000 characters by default. Click "Show full" to fetch the complete content on demand (soft-capped at 500K characters).

Review & Verdicts

The Review Panel lets you record structured feedback for any trace. Reviews are persisted as Langfuse scores and can optionally create golden dataset items.

Review Trace
Verdict
👍 Positive
➖ Neutral
👎 Negative
Failure Step
Select observation node...
Tooling Needs
Rationale
Expected Output
Add to Golden Dataset
golden-dataset-2026-02
Save Review

Review Fields

Field Required Langfuse Score Name Description
Verdict Yes review_verdict Positive (+1), Neutral (0), or Negative (-1)
Failure Step No review_failure_step Observation node where the agent went wrong
Tooling Needs No review_tooling_needs What tools/capabilities the agent lacked
Rationale No review_rationale Explanation of the verdict decision
Expected Output No review_expected_output Ground truth the agent should have produced
Golden Dataset No Push trace as a Langfuse dataset item for evaluation
💡 Tip
When you enable Add to Golden Dataset, the default dataset name is {agent}-golden-{YYYY-MM}. You can type a custom name or select from existing datasets in the dropdown.

Configuration

Agent Replay requires Langfuse credentials and an optional search database. All configuration follows the standard AXIS pattern: YAML → environment variables → hardcoded defaults.

Langfuse Credentials

Set credentials globally or per agent:

# Global (all agents share one Langfuse project)
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com   # optional

# Per-agent (each agent has its own project)
LANGFUSE_ALPHA_BOT_PUBLIC_KEY=pk-lf-...
LANGFUSE_ALPHA_BOT_SECRET_KEY=sk-lf-...
LANGFUSE_REVIEWER_PUBLIC_KEY=pk-lf-...
LANGFUSE_REVIEWER_SECRET_KEY=sk-lf-...

Enable the feature with:

AGENT_REPLAY_ENABLED=true
⚠️ Warning
Langfuse API keys are sensitive. Never commit them to version control. Use environment variables or a secrets manager in production.

Search Database (Optional)

To enable field search, create config/agent_replay_db.yaml from the example template:

# config/agent_replay_db.yaml
enabled: true
host: localhost
port: 5432
database: agent_traces
username: axis_reader
password: ${REPLAY_DB_PASSWORD}
schema: public
table: trace_index
search_column: case_reference
search_column_label: Case Reference
trace_id_column: langfuse_trace_id

# Per-agent overrides (optional)
agents:
  alpha_bot:
    table: alpha_bot_traces
    search_column: ticket_number
  beta_bot:
    table: beta_bot_traces

Tuning Parameters

Parameter Default Description
default_limit 20 Maximum traces returned per search
default_days_back 7 Default time window for trace listing
max_chars 50,000 Content truncation limit per node
search_metadata_key caseReference Langfuse metadata field for fallback search
query_timeout 10s Search DB query timeout
pool_max_size 5 Connection pool maximum for search DB

Keyboard Shortcuts

Key Action Context
/ Navigate between visible tree nodes Observation Tree
Collapse current node (or move to parent) Observation Tree
Expand current node (or move to first child) Observation Tree
Enter Select focused node and load its detail Observation Tree
Esc Close Review Panel Review Panel

Next Steps

AXIS Documentation · Built with MkDocs Material