Dataset API Reference¶
Core data structures for building and managing evaluation datasets.
Dataset
Container for evaluation items. Supports JSON/CSV/DataFrame I/O, filtering, merging, and synthetic generation.
DatasetItem
Individual test case with query, expected/actual output, context, metadata, and conversation history.
Dataset¶
axion.dataset.Dataset
dataclass
¶
Dataset(name: Optional[str] = None, description: str = '', version: str = '1.0', created_at: str = (lambda: current_datetime())(), metadata: Optional[str] = None, items: List[DatasetItem] = list(), _default_catch_all: str = ADDITIONAL_INPUT, _item_map: Dict[str, DatasetItem] = dict(), _synthetic_data: Optional[List[Dict[str, Any]]] = None)
Bases: RichSerializer
Represents a structured dataset for evaluation purposes, supporting both single and multi-turn items.
This class manages a collection of DatasetItem objects and provides functionality for loading, saving, filtering, and transforming datasets.
Attributes:
-
name(Optional[str]) –Name of the dataset
-
description(str) –Description of the dataset's purpose or contents
-
version(str) –Version identifier
-
created_at(str) –ISO format timestamp of creation
-
metadata(Optional[str]) –Additional metadata (stored as JSON)
-
items(List[DatasetItem]) –List of DatasetItem objects
create
classmethod
¶
create(name: Optional[str] = None, items: Optional[List[Union[Dict[str, Any], str]]] = None, ignore_extra_keys: bool = False, **kwargs) -> Dataset
Creates a new dataset with initial items.
Parameters:
-
name(Optional[str], default:None) –Optional dataset name
-
items(Optional[List[Union[Dict[str, Any], str]]], default:None) –Optional list of items (dicts or strings)
-
ignore_extra_keys(bool, default:False) –If True, only use keys that match DatasetItem fields, ignoring any extra keys in dictionaries. Defaults to False.
-
**kwargs–Additional parameters passed to the Dataset constructor.
add_item ¶
add_item(item: Union[DatasetItem, Dict[str, Any]], ignore_extra_keys: bool = False) -> DatasetItem
Add an item to the dataset, handling both single-turn and multi-turn items.
Parameters:
-
item(Union[DatasetItem, Dict[str, Any]]) –Either a DatasetItem instance or a dictionary containing item data
-
ignore_extra_keys(bool, default:False) –If True, only use keys that match DatasetItem fields, ignoring any extra keys in the dictionary. Defaults to False.
Returns:
-
DatasetItem–The DatasetItem instance that was added to the dataset
add_items ¶
add_items(items: List[Union[DatasetItem, Dict[str, Any]]], ignore_extra_keys: bool = False) -> List[DatasetItem]
Add multiple items to the dataset.
Parameters:
-
items(List[Union[DatasetItem, Dict[str, Any]]]) –List of DatasetItem instances or dictionaries
-
ignore_extra_keys(bool, default:False) –If True, only use keys that match DatasetItem fields, ignoring any extra keys in dictionaries. Defaults to False.
Returns:
-
List[DatasetItem]–List of added DatasetItem instances
get_item_by_id ¶
get_item_by_id(item_id: str) -> Optional[DatasetItem]
Retrieve an item by its ID.
Parameters:
-
item_id(str) –ID of the item to find
Returns:
-
Optional[DatasetItem]–DatasetItem if found, None otherwise
filter ¶
filter(condition: Callable[[DatasetItem], bool], dataset_name: Optional[str] = None) -> Dataset
Filters the dataset based on a condition and returns a new Dataset.
read_json
classmethod
¶
read_json(file_path: Union[str, Path], name: Optional[str] = None, ignore_extra_keys: bool = False) -> Dataset
Creates a dataset from a JSON file, correctly parsing multi-turn conversations.
Parameters:
-
file_path(Union[str, Path]) –Path to the JSON file
-
name(Optional[str], default:None) –Optional dataset name
-
ignore_extra_keys(bool, default:False) –If True, only use keys that match DatasetItem fields, ignoring any extra keys in dictionaries. Defaults to False.
read_csv
classmethod
¶
read_csv(file_path: Union[str, Path], name: Optional[str] = None, column_mapping: Optional[Dict[str, str]] = None, ignore_extra_keys: bool = False, **kwargs) -> Dataset
Creates a dataset from a CSV file.
Parameters:
-
file_path(Union[str, Path]) –Path to the CSV file
-
name(Optional[str], default:None) –Optional dataset name
-
column_mapping(Optional[Dict[str, str]], default:None) –Optional mapping to rename columns
-
ignore_extra_keys(bool, default:False) –If True, only use keys that match DatasetItem fields, ignoring any extra keys in dictionaries. Defaults to False.
-
**kwargs–Additional parameters passed to read_dataframe.
read_dataframe
classmethod
¶
read_dataframe(dataframe: DataFrame, name: Optional[str] = None, ignore_extra_keys: bool = False, **kwargs) -> Dataset
Creates a Dataset from a pandas DataFrame, safely deserializing JSON and Python literals. All fields must be included in DataFrame rows to correctly map to DatasetItem.
Parameters:
-
dataframe(DataFrame) –Input DataFrame to read from.
-
name(Optional[str], default:None) –Optional dataset name.
-
ignore_extra_keys(bool, default:False) –If True, only use keys that match DatasetItem fields, ignoring any extra keys in dictionaries. Defaults to False.
-
**kwargs–Additional parameters passed to the Dataset constructor.
Returns:
-
Dataset(Dataset) –A populated Dataset instance.
to_json ¶
Save dataset to JSON file.
Parameters:
-
file_path(str) –Path where to save the JSON file
to_csv ¶
Save dataset to CSV file.
Parameters:
-
file_path(str) –Path where to save the CSV file.
-
remove_aliased(bool, default:True) –If True remove aliased model field keys
to_dataframe ¶
to_dataframe(flatten_nested_json: bool = False, sep: str = '.', remove_aliased: bool = True) -> DataFrame
Converts the dataset to a pandas DataFrame, serializing complex fields to JSON strings.
Parameters:
-
flatten_nested_json(bool, default:False) –If True, nested objects will be flattened into separate columns. If False (default), they will be stored as JSON strings.
-
sep(str, default:'.') –Separator for flattening.
-
remove_aliased(bool, default:True) –If True remove aliased model field keys.
Returns: A pandas DataFrame representing the dataset.
load_dataframe ¶
Load dataset items from a DataFrame.
Parameters:
-
dataframe(DataFrame) –DataFrame containing dataset items.
get_summary ¶
Return summary statistics about the dataset.
get_summary_table ¶
Return summary statistics about the dataset in rich table format.
Parameters:
-
title(str, default:'Dataset Summary') –Title for the log output.
-
**kwargs–Additional arguments passed to the logging method.
execute_dataset_items_from_api ¶
execute_dataset_items_from_api(api_name: str, config: Union[str, Dict[str, Any], Path], max_concurrent: int = 5, show_progress: bool = True, retry_config: Optional[Union[Any, Dict[str, Any]]] = None, require_success: bool = False, additional_config: Optional[Dict[str, Any]] = None, **kwargs) -> None
Synchronously executes API calls using the specified API runner and attaches responses to the dataset items. Useful for batch-processing queries via a registered API.
Internally runs async code but exposes a sync interface to the user.
Parameters:
-
api_name(str) –The name of the registered API to use for execution.
-
config(str | dict | Path) –Config for authenticating with the API.
-
max_concurrent(int, default:5) –Max number of concurrent API requests. Defaults to 5.
-
show_progress(bool, default:True) –Whether to show progress bars using tqdm.
-
retry_config(RetryConfig | Dict, default:None) –Configuration for retrying logic.
-
require_success(bool, default:False) –(bool, optional): If True, remove items from dataset when response.status != 'success'.
-
additional_config(dict, default:None) –Extra configuration options for the runner.
-
**kwargs–Extra arguments passed to the executor's
execute_batchmethod.
merge_response_into_dataset_items
staticmethod
¶
merge_response_into_dataset_items(items: List[DatasetItem], responses: List[RichBaseModel], require_success: bool = False) -> List[DatasetItem]
Updates DatasetItem instances with fields from corresponding APIResponseData.
Parameters:
-
items(List[DatasetItem]) –List of DatasetItem objects to update.
-
responses(List[RichBaseModel]) –List of APIResponseData objects with new runtime values.
-
require_success(bool, default:False) –If True, only keep items when response.status == 'success'.
Returns:
-
List[DatasetItem]–List of DatasetItem objects that were successfully processed (if require_success=True)
-
List[DatasetItem]–or all items (if require_success=False).
synthetic_generate_from_directory ¶
synthetic_generate_from_directory(directory_path: str, llm, params: GenerationParams, embed_model: None, max_concurrent: int = 3, show_progress: bool = True, **kwargs)
Generates synthetic QA data from a directory of documents.
This method uses the DocumentQAGenerator to process documents in the given
directory, producing synthetic question-answer pairs using the provided language model (LLM)
and generation parameters. The results are transformed into a format compatible with the
dataset interface (query, expected_output) and added to the dataset.
Parameters:
-
directory_path(str) –Path to the directory containing documents to process.
-
llm–A language model instance that implements method for generation.
-
params(GenerationParams) –A configuration object that defines generation settings such as number of QA pairs, difficulty, chunking behavior, etc.
-
embed_model(None) –An embedding model used for semantic parsing.
-
max_concurrent(int, default:3) –The maximum number of documents to process concurrently. Defaults to 3.
-
show_progress(bool, default:True) –Whether to show progress bars using tqdm
DatasetItem¶
axion.dataset.DatasetItem ¶
Bases: RichDatasetBaseModel
Represents a single evaluation data point, supporting both single-turn and multi-turn conversations.
This model is designed to store all relevant information required for evaluating LLM performance, including the input query, expected and actual outputs, retrieved context, evaluation criteria, and additional metadata. It supports both automated evaluation (binary judgments, critiques) and richer evaluation with tool usage tracking.
Attributes:
-
id(str) –Unique identifier for the item (auto-generated if not provided).
-
query(Optional[str]) –The input query or prompt for single-turn evaluation. Aliased as
queryfor backward compatibility. -
conversation(Optional[MultiTurnConversation]) –Multi-turn conversation structure containing a sequence of messages. Aliased to
conversation. -
expected_output(Optional[str]) –The reference/expected output for single-turn evaluation. Aliased to
expected_output. -
actual_output(Optional[str]) –The system's generated response for the given query.
-
retrieved_content(Optional[List[str]]) –A list of retrieved documents or contextual snippets used in generating the response.
-
latency(Optional[float]) –Response time in seconds for generating the
actual_output. -
judgment(Optional[Union[str, int]]) –A short, binary or categorical evaluation decision (e.g., 1/0, pass/fail, approve/decline).
-
critique(Optional[str]) –A detailed explanation or rationale supporting the
judgment. -
conversation_extraction_strategy(Literal['first', 'last']) –Defines whether to extract
queryandactual_outputfrom the first or last messages in a multi-turn conversation. Defaults to 'last'. -
acceptance_criteria(Optional[List[str]]) –User-defined definitions of what qualifies as an acceptable/correct response.
-
additional_input(Dict[str, Any]) –Arbitrary key-value pairs providing extra inputs for the evaluation context.
-
metadata(Optional[str]) –Additional metadata as a JSON string for storing structured information.
-
trace(Optional[str]) –Execution trace information, stored as a JSON string.
-
trace_id(Optional[str]) –Optional[str]: Trace ID for the original observation from tracing provider.
observation_id: Optional[str]: Observation ID for the original observation from tracing provider. This is the ID of the specific observation that was evaluated. additional_output (Dict[str, Any]): Extra outputs generated by the system, useful for debugging or extended evaluation. tools_called (Optional[List[ToolCall]]): A list of tools the system actually invoked during response generation. expected_tools (Optional[List[ToolCall]]): A list of tools that should have been invoked according to the evaluation criteria. user_tags (List[str]): A list of custom tags to apply to all tool calls in the conversation.
id
class-attribute
instance-attribute
¶
query
property
writable
¶
Provides a unified way to access the user's query based on the extraction strategy.
If the strategy is 'last' (default), it returns the last user message. If the strategy is 'first', it returns the first user message.
expected_output
property
writable
¶
Provides a unified way to access the expected output.
If the item is a multi-turn conversation, this returns the
reference_text if set. If it's a single-turn item, it
returns the stored expected output.
Returns:
-
Optional[str]–The expected output as a string, or None if not applicable.
conversation
property
¶
Provides direct access to the multi-turn conversation object.
retrieved_content
class-attribute
instance-attribute
¶
judgment
class-attribute
instance-attribute
¶
judgment: Optional[Union[str, int]] = Field(default=None, description='A short, binary decision on the output (e.g., 1/0, pass/fail, approve/decline).')
critique
class-attribute
instance-attribute
¶
critique: Optional[str] = Field(default=None, description='A detailed explanation or feedback supporting the judgment.')
acceptance_criteria
class-attribute
instance-attribute
¶
additional_input
class-attribute
instance-attribute
¶
additional_output
class-attribute
instance-attribute
¶
metadata
class-attribute
instance-attribute
¶
actual_ranking
class-attribute
instance-attribute
¶
actual_ranking: Optional[List[Dict[str, Any]]] = Field(default=None, description='Ordered list of retrieved items, e.g., [{"id": "doc1", "score": 0.9}, {"id": "doc2", "score": 0.8}]')
expected_ranking
class-attribute
instance-attribute
¶
expected_ranking: Optional[List[Dict[str, Any]]] = Field(default=None, description='Ground truth reference. For IR, e.g., [{"id": "doc1", "relevance": 1.0}, {"id": "doc_abc", "relevance": 0.5}]')
tools_called
class-attribute
instance-attribute
¶
tools_called: Optional[List[ToolCall]] = Field(default=None, description='Tools that were actually called by the system')
expected_tools
class-attribute
instance-attribute
¶
expected_tools: Optional[List[ToolCall]] = Field(default=None, description='Tools that should have been called')
user_tags
class-attribute
instance-attribute
¶
user_tags: List[str] = Field(default_factory=list, description='A list of custom tags to apply to all tool calls in the conversation.')
conversation_extraction_strategy
class-attribute
instance-attribute
¶
conversation_extraction_strategy: Literal['first', 'last'] = Field(default='last', description="Determines whether to extract 'query' and 'actual_output' from the 'first' or 'last' messages in a conversation.")
conversation_stats
property
¶
A dictionary of statistics about the conversation.
agent_trajectory
property
¶
An ordered list of tool names called, representing the agent's execution path.
has_errors
property
¶
Returns True if any tool message in the conversation is marked as an error.
to_transcript ¶
Converts the conversation messages into a human-readable string transcript.
If the item is not a multi-turn conversation, it returns an empty string.
Returns:
-
str–A formatted string representing the entire conversation.
extract_by_tag ¶
Extracts tool interactions from the conversation that match a specific tag.
Parameters:
-
tag(str) –The tag to filter by (e.g., 'RAG', 'GUARDRAIL').
Returns:
-
List[tuple[ToolCall, Optional[ToolMessage]]]–A list of tuples, where each tuple contains the tagged ToolCall
-
List[tuple[ToolCall, Optional[ToolMessage]]]–and its corresponding ToolMessage (or None if not found).
get ¶
Get an attribute value by key, similar to dict.get(). This method correctly handles properties like 'query'.
Parameters:
-
key(str) –The attribute name to retrieve.
-
default(Any, default:None) –Value to return if attribute doesn't exist.
Returns:
-
Any–The attribute value or default if not found.
keys ¶
Return all public attribute names, including properties and aliases.
Returns:
-
List[str]–A sorted list of public-facing field and property names.
values ¶
Return all public attribute values, corresponding to the .keys() method.
Returns:
-
List[Any]–A list of values for the public-facing fields and properties.
items ¶
Return all (key, value) pairs for public attributes.
Returns:
-
List[tuple]–A list of (key, value) tuples for public-facing fields and properties.
subset ¶
subset(fields: List[str], keep_id: bool = True, copy_annotations: bool = False) -> DatasetItem
Create a new DatasetItem with only the specified fields, all others set to None/empty.
Parameters:
-
fields(List[str]) –List of field names to keep (e.g., ['query', 'expected_output'])
-
keep_id(bool, default:True) –Whether to preserve the original ID (default: True)
-
copy_annotations(bool, default:False) –Whether to copy annotations (judgment and critique) to the new item (default: False)
Returns:
-
DatasetItem–New DatasetItem instance with only specified fields populated
update ¶
update(other: Union[DatasetItem, Dict[str, Any]], overwrite: bool = True) -> DatasetItem
Update this DatasetItem with values from another DatasetItem or dictionary. This method correctly handles aliases and special merge logic for lists/dicts.
Parameters:
-
other(Union[DatasetItem, Dict[str, Any]]) –Another DatasetItem instance or a dictionary to update from.
-
overwrite(bool, default:True) –If True, overwrite existing values. If False, only fill empty fields.
Returns:
-
DatasetItem–The updated DatasetItem instance (self).
update_runtime ¶
update_runtime(**kwargs) -> DatasetItem
Update runtime-related fields such as actual_output or retrieved_content.
Parameters:
-
**kwargs–Runtime fields to update.
Returns:
-
DatasetItem–Updated DatasetItem (self).
merge_metadata ¶
merge_metadata(metadata: Union[str, Dict[str, Any]]) -> DatasetItem
Merge new metadata into the existing metadata field.
Parameters:
-
metadata(Union[str, Dict[str, Any]]) –A dictionary or JSON string to merge.
Returns:
-
DatasetItem–Updated DatasetItem (self).
from_dict
classmethod
¶
from_dict(data: Dict[str, Any]) -> DatasetItem
Create a DatasetItem from a dictionary.
Parameters:
-
data(Dict[str, Any]) –Dictionary containing item data
Returns:
-
DatasetItem–New DatasetItem instance