Annotation Studio

Human-in-the-loop annotation for quality assurance — score, tag, and critique LLM outputs with keyboard-driven workflows and CSV export.

Why Use Annotation Studio?

Automated metrics are powerful, but they are not the whole story. The Annotation Studio gives your team a structured interface to manually review and label evaluation data — building ground truth datasets, validating automated scores, and catching failure modes that metrics alone miss.

✏ Structured Scoring

Binary accept/reject, 1–5 scale, or custom range scoring modes with keyboard shortcuts for rapid annotation.

🏷 Flexible Tagging

Apply preset or custom tags to categorize outputs — hallucination, off-topic, excellent, needs-context, and more.

⌨ Keyboard-First

Navigate, score, flag, and undo entirely from the keyboard. Annotators stay in flow without reaching for the mouse.

📤 CSV Export

Export annotated data with judgment, critique, user_tags, and annotation_flagged columns appended to your original data.

Quick Start

Start annotating in under a minute:

Navigate to Annotation Studio

Click Annotation Studio in the left sidebar, or go directly to /annotation-studio.

Upload Evaluation Data

Drag a CSV file into the upload zone. The studio auto-detects data format and extracts any existing annotations (columns named judgment, critique, user_tags).

Configure (Optional)

Click Configure to choose which columns to display, set the score mode (binary, 1–5, or custom range), and manage tags.

Annotate & Export

Review each record, assign scores and tags, then click Export to download the annotated CSV with all your judgments.

💡 Tip

If your CSV already contains annotation columns (judgment, critique, user_tags), they are imported automatically so you can resume an interrupted annotation session.

Page Anatomy

Here is how the Annotation Studio is organized once data is loaded, with every major section labeled:

localhost:3500/annotation-studio

Annotation Studio

Human-in-the-loop annotation for quality assurance

Configure

Export

‹

›

Record 3 of 247

01KFX-a3b4-c5d6

🏳

↶

User Query

What are the key differences between retrieval-augmented generation and fine-tuning for domain adaptation? Which approach is more cost-effective for a small team?

AI Response To Evaluate

RAG and fine-tuning serve different purposes. RAG retrieves external documents at inference time, keeping the base model unchanged. Fine-tuning updates model weights on domain data. For small teams, RAG is typically more cost-effective since it doesn't require GPU training infrastructure...

Verdict Rate this response

✓

Meets expectations

A

✗

Reject

Needs improvement

R

Data Upload

When you first open Annotation Studio with no data loaded, the page shows a centered upload zone. This is the same drag-and-drop CSV uploader used across AXIS.

localhost:3500/annotation-studio (empty state)

No Data Loaded

Upload evaluation data to start annotating. The annotation workflow is independent from the evaluation workflow.

Drop your CSV file here, or click to browse

Supports evaluation CSV files with query and output columns

Empty state with the file upload zone. Drag a CSV file here or click to browse.

Supported data formats:

Standard evaluation CSVs — must contain at least query and actual_output columns
Previously annotated CSVs — if columns like judgment, critique, user_tags, or annotation_flagged exist, they are imported automatically
Records are deduplicated by ID column (auto-detected or configured)

ℹ️ Info

The annotation store is independent from the main data store. You can have different datasets loaded in Evaluate and Annotation Studio at the same time.

Annotation Interface

The annotation card is the main workspace. It displays each record's content fields vertically, followed by the scoring and tagging controls.

Content Display

By default, two columns are shown:

User Query (query) — displayed in a neutral gray panel
AI Response (actual_output) — highlighted with a left accent border (primary color) and a "To Evaluate" badge

You can show additional columns (like expected_output, context, or custom fields) by configuring them in the Configure modal. Content supports Markdown rendering via the built-in ContentRenderer.

Accent Gradient Bar

A thin gradient bar (sage green to gold) appears at the top of the annotation card. This is a visual cue inherited from the AXIS brand palette. It separates the navigation header from the content.

📝 Note

Each record is identified by an auto-detected ID column. AXIS looks for columns named id, then any column ending in _id or containing uuid, and falls back to dataset_id. Configure this in the settings modal if auto-detection picks the wrong column.

Scoring & Labels

Below the content panels, three annotation controls appear: the score selector, tag selector, and critique field.

Score Modes

AXIS supports three scoring modes, selectable via the Configure modal:

Mode	UI	Keyboard	Export Value
Binary (default)	Two large buttons: Accept (✓) and Reject (✗)	`A` = Accept, `R` = Reject	`accept` or `reject`
1–5 Scale	Five numbered buttons in a row	`1` through `5`	Integer (1, 2, 3, 4, or 5)
Custom Range	Numbered buttons from min to max	Number keys within range	Integer in configured range

Score Mode: 1–5 Scale

Score (1-5) Rate this response

The 1–5 scale score selector. The selected score (4) is highlighted with the primary color. Press number keys 1–5 to score instantly.

Preset	Tags Included
quality	Excellent, Good, Acceptable, Poor, Terrible
accuracy	Correct, Partially Correct, Incorrect, Hallucination
safety	Safe, Risky, Harmful, Needs Review
custom	Define your own tags via the Manage Tags popover

Critique / Notes

A free-text field below the tags for detailed feedback. This text is exported in the critique column of your CSV.

💡 Tip

Critique text is especially valuable for building alignment datasets. Use it to explain why a response was scored the way it was — this reasoning can be used to train reward models or improve evaluation rubrics.

AXIS is optimized for rapid sequential annotation. All navigation is available from both the UI and the keyboard.

Keyboard Shortcuts

The full shortcut reference is always visible in the sidebar. These work globally unless focus is in a text input.

Key	Action	Notes
`←` or `J`	Previous record	Wraps to first record at start
`→` or `K`	Next record	Stops at last record
`A`	Accept (binary mode)	Sets score to `accept`
`R`	Reject (binary mode)	Sets score to `reject`
`1`–`5`	Set score (scale mode)	Range depends on config
`S`	Flag / Skip record	Toggles orange flag indicator
`Ctrl+Z`	Undo last annotation	Up to 20 actions in history
`Enter`	Go to next record	Convenience alias for →

⚠️ Warning

Keyboard shortcuts are disabled while focus is in the Critique text field. Click outside the textarea or press Escape to re-enable them. This prevents accidental scoring while typing notes.

The right sidebar provides at-a-glance progress tracking:

Completion percentage — large numeric display with animated progress bar
Stats grid — Done (green), Pending (gray), Flagged (orange), and Total counts
Filter tabs — switch the record list between All, Pending, Done, or Flagged views
Record list — scrollable list with colored status dots. Click any record to jump to it. The current record is highlighted with a left border accent and pulsing dot.

The record list shows up to 100 items at a time. If your dataset has more than 100 records, a "+N more records" indicator appears at the bottom.

Flagging Records

Flagging is useful for marking records that need a second look, are ambiguous, or should be discussed with the team. Flagged records:

Show an orange flag icon in the navigation bar
Display an orange dot in the record list
Can be filtered to view only flagged items
Export with annotation_flagged = true in the CSV

Click the Configure button in the page header to open the settings modal. All configuration persists in localStorage.

Configure Annotation Modal

The Configure modal with four sections: ID Column, Display Columns, Score Mode, and Tags. All settings persist in localStorage.

Configuration sections:

ID Column — Select which column uniquely identifies records. Auto-detects id, *_id, and *uuid* columns. Shows a warning if duplicate IDs are found.
Display Columns — Checkbox grid to select which data columns appear in the annotation card. Defaults to query and actual_output.
Score Mode — Radio buttons for Binary, 1–5 Scale, or Custom Range. Custom range shows min/max number inputs.
Tags — Shows current tag set with a "Manage Tags" popover for adding custom tags, applying presets, or resetting to defaults.

Export & Review

Click the Export button in the header to download a CSV with all your annotations merged into the original data.

Exported Columns

The export includes all original columns plus four annotation columns:

Column	Type	Description
`judgment`	string / number	The score value: `accept`, `reject`, or a numeric score (1–5, etc.)
`critique`	string	Free-text feedback written by the annotator
`user_tags`	JSON array string	Selected tags as a JSON array, e.g. `["Correct","Incomplete"]`
`annotation_flagged`	boolean	`true` if the record was flagged for review

dataset_id,query,actual_output,judgment,critique,user_tags,annotation_flagged
01KFX-a1b2,"What is RAG?","RAG retrieves...",accept,"Good explanation","[""Correct""]",
01KFX-b2c3,"Explain fine-tuning","Fine-tuning is...",reject,"Misses key details","[""Incomplete""]",true

💡 Tip

The exported CSV can be re-imported into Annotation Studio to continue annotating. Existing annotations are preserved and merged with any new work.

Undo System

Every annotation action (scoring, tagging, flagging, critique changes) is tracked in an undo stack. Press Ctrl+Z (or Cmd+Z on macOS) to revert the last action. The stack holds up to 20 actions.

A toast notification appears at the bottom-right of the screen confirming the undo. The undo button in the navigation header is only visible when there is an action to undo.

Data Persistence

Annotation state is managed carefully to balance persistence with performance:

What	Where	Why
Annotations (scores, tags, critique)	Zustand store with `localStorage` persist	Survives page refreshes. Re-associated with data on reload.
Configuration (score mode, display columns, tags)	UI store with `localStorage` persist	Settings carry across sessions.
Raw evaluation data	Zustand store (memory only)	Too large for localStorage. Must re-upload after browser restart.
Undo history	Zustand store (memory only)	Reset on page reload. Not persisted.

⚠️ Warning

Your annotation judgments and tags are persisted, but the raw data (the actual CSV records) is not saved to localStorage. If you close and reopen AXIS, you will need to re-upload the CSV file. Your annotations will automatically re-associate with the data by matching record IDs.

Annotation Studio

Why Use Annotation Studio?

✏ Structured Scoring

🏷 Flexible Tagging

⌨ Keyboard-First

📤 CSV Export

Quick Start

Navigate to Annotation Studio

Upload Evaluation Data

Configure (Optional)

Annotate & Export

Page Anatomy

Data Upload

No Data Loaded

Annotation Interface

Content Display

Accent Gradient Bar

Scoring & Labels

Score Modes

Tags

Critique / Notes

Navigation & Progress

Keyboard Shortcuts

Progress Sidebar

Flagging Records

Export & Review

Exported Columns

Undo System

Data Persistence

Next Steps

📊 Evaluate →

⚖ Calibration →

⚙ Configuration →

Annotation Studio

Why Use Annotation Studio?

✏ Structured Scoring

🏷 Flexible Tagging

⌨ Keyboard-First

📤 CSV Export

Quick Start

Navigate to Annotation Studio

Upload Evaluation Data

Configure (Optional)

Annotate & Export

Page Anatomy

Data Upload

No Data Loaded

Annotation Interface

Content Display

Accent Gradient Bar

Scoring & Labels

Score Modes

Tags

Critique / Notes

Navigation & Progress

Keyboard Shortcuts

Progress Sidebar

Flagging Records

Configure Modal

ID Column

Display Columns

Score Mode

Tags

Export & Review

Exported Columns

Undo System

Data Persistence

Next Steps

📊 Evaluate →

⚖ Calibration →

⚙ Configuration →