Annotation Studio
Human-in-the-loop annotation for quality assurance — score, tag, and critique LLM outputs with keyboard-driven workflows and CSV export.
Why Use Annotation Studio?
Automated metrics are powerful, but they are not the whole story. The Annotation Studio gives your team a structured interface to manually review and label evaluation data — building ground truth datasets, validating automated scores, and catching failure modes that metrics alone miss.
Structured Scoring
Binary accept/reject, 1–5 scale, or custom range scoring modes with keyboard shortcuts for rapid annotation.
Flexible Tagging
Apply preset or custom tags to categorize outputs — hallucination, off-topic, excellent, needs-context, and more.
Keyboard-First
Navigate, score, flag, and undo entirely from the keyboard. Annotators stay in flow without reaching for the mouse.
CSV Export
Export annotated data with judgment, critique, user_tags, and annotation_flagged columns appended to your original data.
Quick Start
Start annotating in under a minute:
Navigate to Annotation Studio
Click Annotation Studio in the left sidebar, or go directly to /annotation-studio.
Upload Evaluation Data
Drag a CSV file into the upload zone. The studio auto-detects data format and extracts any existing annotations (columns named judgment, critique, user_tags).
Configure (Optional)
Click Configure to choose which columns to display, set the score mode (binary, 1–5, or custom range), and manage tags.
Annotate & Export
Review each record, assign scores and tags, then click Export to download the annotated CSV with all your judgments.
judgment, critique, user_tags), they are imported automatically so you can resume an interrupted annotation session.
Page Anatomy
Here is how the Annotation Studio is organized once data is loaded, with every major section labeled:
critique column and helps reviewers understand annotator reasoning.Data Upload
When you first open Annotation Studio with no data loaded, the page shows a centered upload zone. This is the same drag-and-drop CSV uploader used across AXIS.
No Data Loaded
Upload evaluation data to start annotating. The annotation workflow is independent from the evaluation workflow.
Drop your CSV file here, or click to browse
Supports evaluation CSV files with query and output columns
Supported data formats:
- Standard evaluation CSVs — must contain at least
queryandactual_outputcolumns - Previously annotated CSVs — if columns like
judgment,critique,user_tags, orannotation_flaggedexist, they are imported automatically - Records are deduplicated by ID column (auto-detected or configured)
Annotation Interface
The annotation card is the main workspace. It displays each record's content fields vertically, followed by the scoring and tagging controls.
Content Display
By default, two columns are shown:
- User Query (
query) — displayed in a neutral gray panel - AI Response (
actual_output) — highlighted with a left accent border (primary color) and a "To Evaluate" badge
You can show additional columns (like expected_output, context, or custom fields) by configuring them in the Configure modal. Content supports Markdown rendering via the built-in ContentRenderer.
Accent Gradient Bar
A thin gradient bar (sage green to gold) appears at the top of the annotation card. This is a visual cue inherited from the AXIS brand palette. It separates the navigation header from the content.
id, then any column ending in _id or containing uuid, and falls back to dataset_id. Configure this in the settings modal if auto-detection picks the wrong column.
Scoring & Labels
Below the content panels, three annotation controls appear: the score selector, tag selector, and critique field.
Score Modes
AXIS supports three scoring modes, selectable via the Configure modal:
| Mode | UI | Keyboard | Export Value |
|---|---|---|---|
| Binary (default) | Two large buttons: Accept (✓) and Reject (✗) | A = Accept, R = Reject | accept or reject |
| 1–5 Scale | Five numbered buttons in a row | 1 through 5 | Integer (1, 2, 3, 4, or 5) |
| Custom Range | Numbered buttons from min to max | Number keys within range | Integer in configured range |
Tags
Tags let you categorize outputs with descriptive labels. AXIS ships with sensible defaults and supports fully custom tag sets.
- Positive tags (Excellent, Cool, Correct, Positive) highlight in green when selected
- Negative/neutral tags (Hallucination, Off-topic, Incomplete, etc.) highlight in red when selected
- Unselected tags appear as neutral gray pills
- Multiple tags can be selected per record
Tag presets are available in the Configure modal:
| Preset | Tags Included |
|---|---|
| quality | Excellent, Good, Acceptable, Poor, Terrible |
| accuracy | Correct, Partially Correct, Incorrect, Hallucination |
| safety | Safe, Risky, Harmful, Needs Review |
| custom | Define your own tags via the Manage Tags popover |
Critique / Notes
A free-text field below the tags for detailed feedback. This text is exported in the critique column of your CSV.
Navigation & Progress
AXIS is optimized for rapid sequential annotation. All navigation is available from both the UI and the keyboard.
Keyboard Shortcuts
The full shortcut reference is always visible in the sidebar. These work globally unless focus is in a text input.
| Key | Action | Notes |
|---|---|---|
| ← or J | Previous record | Wraps to first record at start |
| → or K | Next record | Stops at last record |
| A | Accept (binary mode) | Sets score to accept |
| R | Reject (binary mode) | Sets score to reject |
| 1–5 | Set score (scale mode) | Range depends on config |
| S | Flag / Skip record | Toggles orange flag indicator |
| Ctrl+Z | Undo last annotation | Up to 20 actions in history |
| Enter | Go to next record | Convenience alias for → |
Progress Sidebar
The right sidebar provides at-a-glance progress tracking:
- Completion percentage — large numeric display with animated progress bar
- Stats grid — Done (green), Pending (gray), Flagged (orange), and Total counts
- Filter tabs — switch the record list between All, Pending, Done, or Flagged views
- Record list — scrollable list with colored status dots. Click any record to jump to it. The current record is highlighted with a left border accent and pulsing dot.
The record list shows up to 100 items at a time. If your dataset has more than 100 records, a "+N more records" indicator appears at the bottom.
Flagging Records
Flagging is useful for marking records that need a second look, are ambiguous, or should be discussed with the team. Flagged records:
- Show an orange flag icon in the navigation bar
- Display an orange dot in the record list
- Can be filtered to view only flagged items
- Export with
annotation_flagged = truein the CSV
Configure Modal
Click the Configure button in the page header to open the settings modal. All configuration persists in localStorage.
Configuration sections:
- ID Column — Select which column uniquely identifies records. Auto-detects
id,*_id, and*uuid*columns. Shows a warning if duplicate IDs are found. - Display Columns — Checkbox grid to select which data columns appear in the annotation card. Defaults to
queryandactual_output. - Score Mode — Radio buttons for Binary, 1–5 Scale, or Custom Range. Custom range shows min/max number inputs.
- Tags — Shows current tag set with a "Manage Tags" popover for adding custom tags, applying presets, or resetting to defaults.
Export & Review
Click the Export button in the header to download a CSV with all your annotations merged into the original data.
Exported Columns
The export includes all original columns plus four annotation columns:
| Column | Type | Description |
|---|---|---|
judgment | string / number | The score value: accept, reject, or a numeric score (1–5, etc.) |
critique | string | Free-text feedback written by the annotator |
user_tags | JSON array string | Selected tags as a JSON array, e.g. ["Correct","Incomplete"] |
annotation_flagged | boolean | true if the record was flagged for review |
dataset_id,query,actual_output,judgment,critique,user_tags,annotation_flagged
01KFX-a1b2,"What is RAG?","RAG retrieves...",accept,"Good explanation","[""Correct""]",
01KFX-b2c3,"Explain fine-tuning","Fine-tuning is...",reject,"Misses key details","[""Incomplete""]",true
Undo System
Every annotation action (scoring, tagging, flagging, critique changes) is tracked in an undo stack. Press Ctrl+Z (or Cmd+Z on macOS) to revert the last action. The stack holds up to 20 actions.
A toast notification appears at the bottom-right of the screen confirming the undo. The undo button in the navigation header is only visible when there is an action to undo.
Data Persistence
Annotation state is managed carefully to balance persistence with performance:
| What | Where | Why |
|---|---|---|
| Annotations (scores, tags, critique) | Zustand store with localStorage persist | Survives page refreshes. Re-associated with data on reload. |
| Configuration (score mode, display columns, tags) | UI store with localStorage persist | Settings carry across sessions. |
| Raw evaluation data | Zustand store (memory only) | Too large for localStorage. Must re-upload after browser restart. |
| Undo history | Zustand store (memory only) | Reset on page reload. Not persisted. |