Simulation
Generate synthetic conversations with persona-based testing — stress-test your AI agents at scale before shipping to production.
Why Use Simulation?
Real user data is limited, slow to collect, and often lacks edge-case coverage. The Simulation page lets you generate diverse, controlled test conversations by defining personas and scenarios, then running them against your agents in batch.
Persona-Based Testing
Define user personas with traits, expertise levels, and communication styles to simulate realistic diversity.
Synthetic Conversations
Generate multi-turn conversations grounded in persona profiles and scenario templates.
Agent Integration
Connect directly to your AI agents and run automated simulation tests end-to-end.
Batch Evaluation
Run hundreds of simulations in parallel, automatically evaluate quality, and export results.
Quick Start
Follow these four steps to run your first simulation:
Define Personas
Create one or more user personas with attributes like name, background, expertise level, and communication style. Personas drive the tone and content of generated conversations.
Configure Scenarios
Set up test scenarios with topic templates, difficulty levels, and expected behaviors. Each scenario defines the conversation context your personas will interact with.
Run Simulation
Launch the simulation batch. Monitor progress in real time as conversations are generated across all persona-scenario combinations.
Review & Export
Inspect generated conversations, review quality metrics, and export the results as CSV for use in the Evaluate page or external analysis tools.
Page Anatomy
The Simulation page is organized into four workflow sections accessible via a vertical stepper or tab bar:
Simulation
Generate synthetic conversations with persona-based testing
Persona Configuration
Personas define who is interacting with your AI agent. Each persona is a set of attributes that shape the generated conversation's tone, vocabulary, and complexity.
Persona Attributes
| Attribute | Description | Example Values |
|---|---|---|
| Name | Display name for identification | Jane Doe, Alex Smith |
| Age | Simulated age bracket | 25, 42, 68 |
| Background | Professional or personal context | Senior Engineer, New Customer |
| Expertise Level | Familiarity with the product domain | Beginner Intermediate Expert |
| Communication Style | How the persona phrases questions | Technical, Casual, Formal, Verbose, Concise |
Effective Persona Design
- Cover the spectrum — Include beginners, intermediates, and experts to test different response styles
- Vary communication styles — A terse, technical user exercises different code paths than a verbose, casual one
- Add adversarial personas — Create a persona that asks ambiguous or off-topic questions to test guardrails
- Match your user base — Model personas after real customer segments for relevant test coverage
Scenario Setup
Scenarios define what each persona will ask about. They provide the topic, context, difficulty, and expected agent behaviors.
Scenario Template Fields
| Field | Description | Example |
|---|---|---|
| Topic | The subject area of the conversation | Return policy, Account setup |
| Difficulty | Complexity level for the scenario | Easy Medium Hard |
| Context / Prompt | Additional instructions or constraints | "User is frustrated after 3 failed attempts" |
| Expected Behaviors | What the agent should do or avoid | Offer escalation, Avoid jargon |
| Max Turns | Conversation length limit | 5, 10, 20 |
User asks about returning an electronics item purchased online within the last 30 days.
Frustrated user disputes a charge after 3 failed support attempts. Requires empathy and escalation.
New user needs help creating an account and connecting a payment method.
Running Simulations
Once personas and scenarios are configured, launch the simulation. The Run step shows real-time progress as conversations are generated.
Progress Monitoring
| Persona | Scenario | Turns | Status |
|---|---|---|---|
| Jane Doe | Return Policy | 5 | Complete |
| Jane Doe | Billing Dispute | 8 | Complete |
| Alex Smith | Return Policy | 4 | Complete |
| Alex Smith | Account Setup | 6 | Running |
| Pat Lee | Billing Dispute | — | Queued |
Key behaviors during a run:
- Parallel execution — Multiple conversations run concurrently for faster batch completion
- Live status — Each row updates in real time as conversations progress through turns
- Error handling — Failed conversations are marked in red and can be retried individually
- Cancel support — Stop the batch at any time; completed conversations are preserved
Results Review
After the simulation completes, the Results step displays generated conversations alongside quality metrics.
Overview Metrics
A KPI strip at the top summarizes the simulation run:
| Persona | Scenario | Turns | Quality | Issues | |
|---|---|---|---|---|---|
| Jane Doe | Return Policy | 5 | 0.92 | — | View → |
| Jane Doe | Billing Dispute | 8 | 0.71 | Missed escalation | View → |
| Alex Smith | Return Policy | 4 | 0.89 | — | View → |
| Alex Smith | Account Setup | 6 | 0.87 | — | View → |
| Pat Lee | Billing Dispute | 10 | 0.52 | Tone mismatch | View → |
Conversation Detail View
Click View on any row to open the full conversation transcript. The detail view shows:
- Message timeline — Alternating user (persona) and agent messages with timestamps
- Per-turn scores — Quality and relevance scores for each agent response
- Issue annotations — Flagged turns are highlighted with inline explanations
- Persona context — A sidebar reminds you of the persona's attributes and the scenario template
Export & Integration
Simulation results can be exported and fed into other AXIS pages for deeper analysis.
Export Options
| Format | Contents | Use Case |
|---|---|---|
| CSV | Flattened rows with persona, scenario, turns, scores, and full transcript | Upload to the Evaluate page for scoring with additional metrics |
| JSON | Structured conversation objects with metadata | Programmatic analysis, CI/CD pipeline integration |
Integration with Other Pages
- Evaluate — Export simulation CSV and upload it as an evaluation dataset. Run LLM-as-Judge or automated metrics on the generated conversations.
- Monitoring — Use simulation results as baseline data to compare against production conversation quality.
- Calibration — Feed simulation outputs into the Calibration Studio to test judge agreement on synthetic data before applying to production.