Simulation

Generate synthetic conversations with persona-based testing — stress-test your AI agents at scale before shipping to production.

Why Use Simulation?

Real user data is limited, slow to collect, and often lacks edge-case coverage. The Simulation page lets you generate diverse, controlled test conversations by defining personas and scenarios, then running them against your agents in batch.

👥 Persona-Based Testing

Define user personas with traits, expertise levels, and communication styles to simulate realistic diversity.

💬 Synthetic Conversations

Generate multi-turn conversations grounded in persona profiles and scenario templates.

🤖 Agent Integration

Connect directly to your AI agents and run automated simulation tests end-to-end.

Batch Evaluation

Run hundreds of simulations in parallel, automatically evaluate quality, and export results.

Quick Start

Follow these four steps to run your first simulation:

1

Define Personas

Create one or more user personas with attributes like name, background, expertise level, and communication style. Personas drive the tone and content of generated conversations.

2

Configure Scenarios

Set up test scenarios with topic templates, difficulty levels, and expected behaviors. Each scenario defines the conversation context your personas will interact with.

3

Run Simulation

Launch the simulation batch. Monitor progress in real time as conversations are generated across all persona-scenario combinations.

4

Review & Export

Inspect generated conversations, review quality metrics, and export the results as CSV for use in the Evaluate page or external analysis tools.

💡 Tip
Start with 2–3 personas and a single scenario to validate your setup. You can scale up to dozens of personas and scenarios once you're comfortable with the output quality.

Page Anatomy

The Simulation page is organized into four workflow sections accessible via a vertical stepper or tab bar:

localhost:3500/simulation

Simulation

Generate synthetic conversations with persona-based testing

1
1. Personas
2. Scenarios
3. Run
4. Results
2
JD
Jane Doe
Senior Engineer
Expert Technical Concise
AS
Alex Smith
New Customer
Beginner Casual Verbose
3
+ Add Persona
The Simulation page showing the workflow stepper, persona cards with attribute badges, and the add-persona button.
1
Page Header — Title, icon, and subtitle. Consistent with all AXIS pages.
2
Workflow Stepper — Four steps: Personas, Scenarios, Run, and Results. The active step is highlighted with the primary color.
3
Persona Cards — Each persona is displayed as a card with an avatar, name, role, and attribute badges (expertise, style, verbosity).

Persona Configuration

Personas define who is interacting with your AI agent. Each persona is a set of attributes that shape the generated conversation's tone, vocabulary, and complexity.

Persona Attributes

AttributeDescriptionExample Values
NameDisplay name for identificationJane Doe, Alex Smith
AgeSimulated age bracket25, 42, 68
BackgroundProfessional or personal contextSenior Engineer, New Customer
Expertise LevelFamiliarity with the product domainBeginner Intermediate Expert
Communication StyleHow the persona phrases questionsTechnical, Casual, Formal, Verbose, Concise
ℹ️ Info
Personas are saved locally and reusable across simulation runs. Click any persona card to edit its attributes, or use the + Add Persona button to create a new one.

Effective Persona Design

  • Cover the spectrum — Include beginners, intermediates, and experts to test different response styles
  • Vary communication styles — A terse, technical user exercises different code paths than a verbose, casual one
  • Add adversarial personas — Create a persona that asks ambiguous or off-topic questions to test guardrails
  • Match your user base — Model personas after real customer segments for relevant test coverage

Scenario Setup

Scenarios define what each persona will ask about. They provide the topic, context, difficulty, and expected agent behaviors.

Scenario Template Fields

FieldDescriptionExample
TopicThe subject area of the conversationReturn policy, Account setup
DifficultyComplexity level for the scenarioEasy Medium Hard
Context / PromptAdditional instructions or constraints"User is frustrated after 3 failed attempts"
Expected BehaviorsWhat the agent should do or avoidOffer escalation, Avoid jargon
Max TurnsConversation length limit5, 10, 20
Scenario Configuration
Return Policy Inquiry
Easy

User asks about returning an electronics item purchased online within the last 30 days.

Max turns: 5 · Expected: Provide clear policy, offer receipt lookup
Billing Dispute
Hard

Frustrated user disputes a charge after 3 failed support attempts. Requires empathy and escalation.

Max turns: 10 · Expected: Acknowledge frustration, escalate to human
Account Setup
Medium

New user needs help creating an account and connecting a payment method.

Max turns: 8 · Expected: Step-by-step guidance, verify completion
+ Add Scenario
Scenario configuration cards showing topic, difficulty badge, description, expected behaviors, and max turn count.
💡 Tip
Combine each persona with every scenario to maximize coverage. For example, 3 personas across 4 scenarios yields 12 simulated conversations, each with a unique user/topic combination.

Running Simulations

Once personas and scenarios are configured, launch the simulation. The Run step shows real-time progress as conversations are generated.

Progress Monitoring

Simulation in Progress
Simulation Progress
8 of 12 conversations complete
Estimated time remaining: ~2 min
Total
12
Complete
8
In Progress
2
Queued
2
PersonaScenarioTurnsStatus
Jane DoeReturn Policy5Complete
Jane DoeBilling Dispute8Complete
Alex SmithReturn Policy4Complete
Alex SmithAccount Setup6Running
Pat LeeBilling DisputeQueued
Real-time simulation progress showing overall completion bar, batch KPIs, and per-conversation status table.

Key behaviors during a run:

  • Parallel execution — Multiple conversations run concurrently for faster batch completion
  • Live status — Each row updates in real time as conversations progress through turns
  • Error handling — Failed conversations are marked in red and can be retried individually
  • Cancel support — Stop the batch at any time; completed conversations are preserved
⚠️ Warning
Large batches (50+ conversations) may take several minutes depending on your agent's response latency. Monitor the estimated time remaining and adjust batch size if needed.

Results Review

After the simulation completes, the Results step displays generated conversations alongside quality metrics.

Overview Metrics

A KPI strip at the top summarizes the simulation run:

Simulation Results
Conversations
12
Avg. Quality
0.84
Avg. Turns
6.2
Issues Found
3
PersonaScenarioTurnsQualityIssues
Jane Doe Return Policy 5 0.92 View →
Jane Doe Billing Dispute 8 0.71 Missed escalation View →
Alex Smith Return Policy 4 0.89 View →
Alex Smith Account Setup 6 0.87 View →
Pat Lee Billing Dispute 10 0.52 Tone mismatch View →
Results overview with quality KPIs and a conversation table showing per-conversation quality scores and detected issues.

Conversation Detail View

Click View on any row to open the full conversation transcript. The detail view shows:

  • Message timeline — Alternating user (persona) and agent messages with timestamps
  • Per-turn scores — Quality and relevance scores for each agent response
  • Issue annotations — Flagged turns are highlighted with inline explanations
  • Persona context — A sidebar reminds you of the persona's attributes and the scenario template
ℹ️ Info
Low-scoring conversations are the most valuable output of simulation. Focus your review on red and yellow results to identify agent weaknesses before real users encounter them.

Export & Integration

Simulation results can be exported and fed into other AXIS pages for deeper analysis.

Export Options

FormatContentsUse Case
CSVFlattened rows with persona, scenario, turns, scores, and full transcriptUpload to the Evaluate page for scoring with additional metrics
JSONStructured conversation objects with metadataProgrammatic analysis, CI/CD pipeline integration

Integration with Other Pages

  • Evaluate — Export simulation CSV and upload it as an evaluation dataset. Run LLM-as-Judge or automated metrics on the generated conversations.
  • Monitoring — Use simulation results as baseline data to compare against production conversation quality.
  • Calibration — Feed simulation outputs into the Calibration Studio to test judge agreement on synthetic data before applying to production.
💡 Tip
A powerful workflow: run simulations before each release, export to Evaluate, compare scores against the previous release's simulation, and use the delta to decide whether to ship.

Next Steps

AXIS Documentation · Built with MkDocs Material