Annotation Studio

Human-in-the-loop annotation for quality assurance — score, tag, and critique LLM outputs with keyboard-driven workflows and CSV export.

Why Use Annotation Studio?

Automated metrics are powerful, but they are not the whole story. The Annotation Studio gives your team a structured interface to manually review and label evaluation data — building ground truth datasets, validating automated scores, and catching failure modes that metrics alone miss.

Structured Scoring

Binary accept/reject, 1–5 scale, or custom range scoring modes with keyboard shortcuts for rapid annotation.

🏷 Flexible Tagging

Apply preset or custom tags to categorize outputs — hallucination, off-topic, excellent, needs-context, and more.

Keyboard-First

Navigate, score, flag, and undo entirely from the keyboard. Annotators stay in flow without reaching for the mouse.

📤 CSV Export

Export annotated data with judgment, critique, user_tags, and annotation_flagged columns appended to your original data.

Quick Start

Start annotating in under a minute:

1

Navigate to Annotation Studio

Click Annotation Studio in the left sidebar, or go directly to /annotation-studio.

2

Upload Evaluation Data

Drag a CSV file into the upload zone. The studio auto-detects data format and extracts any existing annotations (columns named judgment, critique, user_tags).

3

Configure (Optional)

Click Configure to choose which columns to display, set the score mode (binary, 1–5, or custom range), and manage tags.

4

Annotate & Export

Review each record, assign scores and tags, then click Export to download the annotated CSV with all your judgments.

💡 Tip
If your CSV already contains annotation columns (judgment, critique, user_tags), they are imported automatically so you can resume an interrupted annotation session.

Page Anatomy

Here is how the Annotation Studio is organized once data is loaded, with every major section labeled:

localhost:3500/annotation-studio
Annotation Studio
Human-in-the-loop annotation for quality assurance
Configure
Export
1
Record 3 of 247
01KFX-a3b4-c5d6
🏳
2
User Query
What are the key differences between retrieval-augmented generation and fine-tuning for domain adaptation? Which approach is more cost-effective for a small team?
3
AI Response To Evaluate
RAG and fine-tuning serve different purposes. RAG retrieves external documents at inference time, keeping the base model unchanged. Fine-tuning updates model weights on domain data. For small teams, RAG is typically more cost-effective since it doesn't require GPU training infrastructure...
4
Verdict Rate this response
Accept
Meets expectations
A
Reject
Needs improvement
R
5
Tags
Correct Incomplete Hallucination Off-topic Excellent Needs Context
6
Critique / Notes optional
Good coverage of the core concepts. Could expand on cost comparison specifics...
7
42%
104 of 247 records
104
Done
138
Pending
5
Flagged
247
Total
All (247)
Pending
Done
Flagged
Records 247 shown
#101KFX-a1b2
#201KFX-b2c3
#301KFX-a3b4
#401KFX-c4d5
#501KFX-d5e6
#601KFX-e6f7
Keyboard Shortcuts
AcceptA
RejectR
Previous← / J
Next→ / K
Flag / SkipS
8
Previous
/ to navigate
Next
9
The Annotation Studio showing the full page anatomy — header with actions, annotation card with content sections and scoring controls, progress sidebar, and footer navigation.
1
Page Header & Actions — Title with MessageSquare icon. Configure opens the settings modal; Export downloads annotated CSV.
2
Record Navigation Bar — Previous/Next arrows, current record position (e.g., "Record 3 of 247"), record ID chip, Flag button, and Undo button.
3
User Query — The input prompt displayed in a neutral gray box. Column name configurable via the Configure modal.
4
AI Response (To Evaluate) — The LLM output to review, highlighted with a left accent border and "To Evaluate" badge. Supports Markdown rendering.
5
Score Selector — Binary (Accept/Reject) by default. Switches to 1–5 scale or custom range via Configure. Keyboard shortcuts shown inline.
6
Tag Selector — Toggle tags to categorize the output. Positive tags highlight green, negative tags highlight red. Tags are customizable.
7
Critique / Notes — Free-text field for detailed feedback. Exported in the critique column and helps reviewers understand annotator reasoning.
8
Progress Sidebar — Live completion percentage, stats grid (Done/Pending/Flagged/Total), filterable record list, and keyboard shortcut reference.
9
Footer Navigation — Previous/Next buttons with the primary "Next" action styled prominently. Keyboard shortcut hints displayed between them.

Data Upload

When you first open Annotation Studio with no data loaded, the page shows a centered upload zone. This is the same drag-and-drop CSV uploader used across AXIS.

localhost:3500/annotation-studio (empty state)

No Data Loaded

Upload evaluation data to start annotating. The annotation workflow is independent from the evaluation workflow.

Drop your CSV file here, or click to browse

Supports evaluation CSV files with query and output columns

Empty state with the file upload zone. Drag a CSV file here or click to browse.

Supported data formats:

  • Standard evaluation CSVs — must contain at least query and actual_output columns
  • Previously annotated CSVs — if columns like judgment, critique, user_tags, or annotation_flagged exist, they are imported automatically
  • Records are deduplicated by ID column (auto-detected or configured)
ℹ️ Info
The annotation store is independent from the main data store. You can have different datasets loaded in Evaluate and Annotation Studio at the same time.

Annotation Interface

The annotation card is the main workspace. It displays each record's content fields vertically, followed by the scoring and tagging controls.

Content Display

By default, two columns are shown:

  • User Query (query) — displayed in a neutral gray panel
  • AI Response (actual_output) — highlighted with a left accent border (primary color) and a "To Evaluate" badge

You can show additional columns (like expected_output, context, or custom fields) by configuring them in the Configure modal. Content supports Markdown rendering via the built-in ContentRenderer.

Accent Gradient Bar

A thin gradient bar (sage green to gold) appears at the top of the annotation card. This is a visual cue inherited from the AXIS brand palette. It separates the navigation header from the content.

📝 Note
Each record is identified by an auto-detected ID column. AXIS looks for columns named id, then any column ending in _id or containing uuid, and falls back to dataset_id. Configure this in the settings modal if auto-detection picks the wrong column.

Scoring & Labels

Below the content panels, three annotation controls appear: the score selector, tag selector, and critique field.

Score Modes

AXIS supports three scoring modes, selectable via the Configure modal:

ModeUIKeyboardExport Value
Binary (default) Two large buttons: Accept (✓) and Reject (✗) A = Accept, R = Reject accept or reject
1–5 Scale Five numbered buttons in a row 1 through 5 Integer (1, 2, 3, 4, or 5)
Custom Range Numbered buttons from min to max Number keys within range Integer in configured range
Score Mode: 1–5 Scale
Score (1-5) Rate this response
1
2
3
4
5
The 1–5 scale score selector. The selected score (4) is highlighted with the primary color. Press number keys 1–5 to score instantly.

Tags

Tags let you categorize outputs with descriptive labels. AXIS ships with sensible defaults and supports fully custom tag sets.

  • Positive tags (Excellent, Cool, Correct, Positive) highlight in green when selected
  • Negative/neutral tags (Hallucination, Off-topic, Incomplete, etc.) highlight in red when selected
  • Unselected tags appear as neutral gray pills
  • Multiple tags can be selected per record

Tag presets are available in the Configure modal:

PresetTags Included
qualityExcellent, Good, Acceptable, Poor, Terrible
accuracyCorrect, Partially Correct, Incorrect, Hallucination
safetySafe, Risky, Harmful, Needs Review
customDefine your own tags via the Manage Tags popover

Critique / Notes

A free-text field below the tags for detailed feedback. This text is exported in the critique column of your CSV.

💡 Tip
Critique text is especially valuable for building alignment datasets. Use it to explain why a response was scored the way it was — this reasoning can be used to train reward models or improve evaluation rubrics.

AXIS is optimized for rapid sequential annotation. All navigation is available from both the UI and the keyboard.

Keyboard Shortcuts

The full shortcut reference is always visible in the sidebar. These work globally unless focus is in a text input.

KeyActionNotes
or JPrevious recordWraps to first record at start
or KNext recordStops at last record
AAccept (binary mode)Sets score to accept
RReject (binary mode)Sets score to reject
15Set score (scale mode)Range depends on config
SFlag / Skip recordToggles orange flag indicator
Ctrl+ZUndo last annotationUp to 20 actions in history
EnterGo to next recordConvenience alias for →
⚠️ Warning
Keyboard shortcuts are disabled while focus is in the Critique text field. Click outside the textarea or press Escape to re-enable them. This prevents accidental scoring while typing notes.

Progress Sidebar

The right sidebar provides at-a-glance progress tracking:

  • Completion percentage — large numeric display with animated progress bar
  • Stats grid — Done (green), Pending (gray), Flagged (orange), and Total counts
  • Filter tabs — switch the record list between All, Pending, Done, or Flagged views
  • Record list — scrollable list with colored status dots. Click any record to jump to it. The current record is highlighted with a left border accent and pulsing dot.

The record list shows up to 100 items at a time. If your dataset has more than 100 records, a "+N more records" indicator appears at the bottom.

Flagging Records

Flagging is useful for marking records that need a second look, are ambiguous, or should be discussed with the team. Flagged records:

  • Show an orange flag icon in the navigation bar
  • Display an orange dot in the record list
  • Can be filtered to view only flagged items
  • Export with annotation_flagged = true in the CSV

Configure Modal

Click the Configure button in the page header to open the settings modal. All configuration persists in localStorage.

Configure Annotation Modal
Configure Annotation
×

ID Column

Select which column serves as the unique identifier for records.

dataset_id

Display Columns

Select which columns to show in the annotation view.

Score Mode

Choose how to score annotations.

Tags

Manage Tags

Configure available tags for categorizing annotations.

Correct Incomplete Hallucination Off-topic Excellent Needs Context
Done
The Configure modal with four sections: ID Column, Display Columns, Score Mode, and Tags. All settings persist in localStorage.

Configuration sections:

  1. ID Column — Select which column uniquely identifies records. Auto-detects id, *_id, and *uuid* columns. Shows a warning if duplicate IDs are found.
  2. Display Columns — Checkbox grid to select which data columns appear in the annotation card. Defaults to query and actual_output.
  3. Score Mode — Radio buttons for Binary, 1–5 Scale, or Custom Range. Custom range shows min/max number inputs.
  4. Tags — Shows current tag set with a "Manage Tags" popover for adding custom tags, applying presets, or resetting to defaults.

Export & Review

Click the Export button in the header to download a CSV with all your annotations merged into the original data.

Exported Columns

The export includes all original columns plus four annotation columns:

ColumnTypeDescription
judgmentstring / numberThe score value: accept, reject, or a numeric score (1–5, etc.)
critiquestringFree-text feedback written by the annotator
user_tagsJSON array stringSelected tags as a JSON array, e.g. ["Correct","Incomplete"]
annotation_flaggedbooleantrue if the record was flagged for review
dataset_id,query,actual_output,judgment,critique,user_tags,annotation_flagged
01KFX-a1b2,"What is RAG?","RAG retrieves...",accept,"Good explanation","[""Correct""]",
01KFX-b2c3,"Explain fine-tuning","Fine-tuning is...",reject,"Misses key details","[""Incomplete""]",true
💡 Tip
The exported CSV can be re-imported into Annotation Studio to continue annotating. Existing annotations are preserved and merged with any new work.

Undo System

Every annotation action (scoring, tagging, flagging, critique changes) is tracked in an undo stack. Press Ctrl+Z (or Cmd+Z on macOS) to revert the last action. The stack holds up to 20 actions.

A toast notification appears at the bottom-right of the screen confirming the undo. The undo button in the navigation header is only visible when there is an action to undo.

Data Persistence

Annotation state is managed carefully to balance persistence with performance:

WhatWhereWhy
Annotations (scores, tags, critique)Zustand store with localStorage persistSurvives page refreshes. Re-associated with data on reload.
Configuration (score mode, display columns, tags)UI store with localStorage persistSettings carry across sessions.
Raw evaluation dataZustand store (memory only)Too large for localStorage. Must re-upload after browser restart.
Undo historyZustand store (memory only)Reset on page reload. Not persisted.
⚠️ Warning
Your annotation judgments and tags are persisted, but the raw data (the actual CSV records) is not saved to localStorage. If you close and reopen AXIS, you will need to re-upload the CSV file. Your annotations will automatically re-associate with the data by matching record IDs.

Next Steps

AXIS Documentation · Built with MkDocs Material