Skip to content

Issue Extractor Reference

API reference for the issue extraction and signal analysis module.

from axion.reporting import (
    IssueExtractor,
    ExtractedIssue,
    IssueExtractionResult,
    IssueGroup,
    LLMSummaryInput,
    MetricSignalAdapter,
    SignalAdapterRegistry,
)
E

IssueExtractor

Core engine for extracting low-score signals from evaluation results into structured, actionable issues.

R

SignalAdapterRegistry

Registry for metric signal adapters. Maps metric keys to extraction rules for pass/fail signals.

A

MetricSignalAdapter

Adapter defining how to extract issues from a specific metric's signals — headline signals, failure values, and context.

D

Data Classes

Structured types for extracted issues, groups, summaries, and LLM input — the output of the extraction pipeline.


IssueExtractor

axion.reporting.issue_extractor.IssueExtractor

IssueExtractor(score_threshold: float = 0.0, include_nan: bool = False, include_context_fields: Optional[List[str]] = None, metric_names: Optional[List[str]] = None, max_issues: Optional[int] = None, sample_rate: Optional[float] = None)

Extracts low-score signals from evaluation results for LLM-based issue summarization.

This class reads existing signal data from MetricScore objects and extracts issues (low-score signals) in a normalized format suitable for analysis.

Initialize the IssueExtractor.

Parameters:

  • score_threshold (float, default: 0.0 ) –

    Signals with scores at or below this threshold are considered issues. Default 0.0 means only explicit failures.

  • include_nan (bool, default: False ) –

    Whether to include signals with NaN scores as issues.

  • include_context_fields (Optional[List[str]], default: None ) –

    Fields to include from test case context. Defaults to ['query', 'actual_output', 'expected_output'].

  • metric_names (Optional[List[str]], default: None ) –

    Optional list of metric names to filter. If None, all metrics are processed.

  • max_issues (Optional[int], default: None ) –

    Hard limit on number of issues to return.

  • sample_rate (Optional[float], default: None ) –

    Deterministic sampling rate (0.0-1.0) by test_case_id.

extract_from_evaluation

extract_from_evaluation(result: EvaluationResult) -> IssueExtractionResult

Extract all issues from an EvaluationResult.

Parameters:

Returns:

extract_from_test_result

extract_from_test_result(test_result: TestResult, result_index: int) -> List[ExtractedIssue]

Extract issues from a single TestResult.

Parameters:

  • test_result (TestResult) –

    The TestResult to analyze

  • result_index (int) –

    Index in the results list

Returns:

  • List[ExtractedIssue]

    List of ExtractedIssue objects found in this TestResult.

extract_from_metric_score

extract_from_metric_score(metric_score: MetricScore, test_case_id: str, test_case: Any, result_index: int, score_index: int) -> List[ExtractedIssue]

Extract issues from a single MetricScore.

Parameters:

  • metric_score (MetricScore) –

    The MetricScore to analyze

  • test_case_id (str) –

    ID of the test case

  • test_case (Any) –

    The test case object for context

  • result_index (int) –

    Index in the results list

  • score_index (int) –

    Index in the score_results list

Returns:

  • List[ExtractedIssue]

    List of ExtractedIssue objects found in this MetricScore.

to_llm_input

to_llm_input(result: IssueExtractionResult, max_issues: int = 50) -> LLMSummaryInput

Convert extraction result to structured LLM input.

Parameters:

  • result (IssueExtractionResult) –

    The IssueExtractionResult to convert

  • max_issues (int, default: 50 ) –

    Maximum number of detailed issues to include

Returns:

to_prompt_text

to_prompt_text(result: IssueExtractionResult, max_issues: int = 50) -> str

Generate a text prompt for LLM-based issue summarization.

Parameters:

  • result (IssueExtractionResult) –

    The IssueExtractionResult to convert

  • max_issues (int, default: 50 ) –

    Maximum number of detailed issues to include

Returns:

  • str

    Formatted prompt text.

to_grouped_prompt_text

to_grouped_prompt_text(result: IssueExtractionResult, llm: Optional[LLMRunnable] = None, max_groups: int = 20, max_examples_per_group: int = 2) -> str

Generate a grouped prompt with optional LLM summarization.

Groups similar issues together and shows representative examples, reducing context size while preserving signal quality.

Parameters:

  • result (IssueExtractionResult) –

    The IssueExtractionResult to convert

  • llm (Optional[LLMRunnable], default: None ) –

    Optional LLM for generating group summaries

  • max_groups (int, default: 20 ) –

    Maximum number of issue groups to include

  • max_examples_per_group (int, default: 2 ) –

    Representative examples per group

Returns:

  • str

    Formatted prompt text with grouped issues.

to_grouped_prompt_text_async async

to_grouped_prompt_text_async(result: IssueExtractionResult, llm: Optional[LLMRunnable] = None, max_groups: int = 20, max_examples_per_group: int = 2) -> str

Generate a grouped prompt with optional LLM summarization (async version).

Groups similar issues together and shows representative examples, reducing context size while preserving signal quality.

Parameters:

  • result (IssueExtractionResult) –

    The IssueExtractionResult to convert

  • llm (Optional[LLMRunnable], default: None ) –

    Optional LLM for generating group summaries

  • max_groups (int, default: 20 ) –

    Maximum number of issue groups to include

  • max_examples_per_group (int, default: 2 ) –

    Representative examples per group

Returns:

  • str

    Formatted prompt text with grouped issues.

summarize async

summarize(result: IssueExtractionResult, llm: LLMRunnable, prompt_template: Optional[str] = None, max_issues: int = 100) -> IssueSummary

Generate a complete LLM-powered summary of evaluation issues.

This method sends the issues to an LLM and returns a structured summary including failure categories, root causes, and recommendations.

Parameters:

  • result (IssueExtractionResult) –

    The IssueExtractionResult to summarize

  • llm (LLMRunnable) –

    The LLM to use for generating the summary (must have acomplete method)

  • prompt_template (Optional[str], default: None ) –

    Custom prompt template. If None, uses DEFAULT_SUMMARY_PROMPT. The template should include {overview} and {issue_data} placeholders.

  • max_issues (int, default: 100 ) –

    Maximum number of issues to include in the prompt (default 100)

Returns:

  • IssueSummary

    IssueSummary containing the LLM's analysis.

Example
from axion.reporting import IssueExtractor
from axion.llm_registry import LLMRegistry

extractor = IssueExtractor()
issues = extractor.extract_from_evaluation(eval_result)

reg = LLMRegistry('anthropic')
llm = reg.get_llm('claude-sonnet-4-20250514')

summary = await extractor.summarize(issues, llm=llm)
print(summary.text)
Example with custom prompt
custom_prompt = '''
Analyze these {overview} issues:
{issue_data}

Provide a brief summary focused on:
1. Top 3 failure patterns
2. Quick wins to fix
'''
summary = await extractor.summarize(issues, llm=llm, prompt_template=custom_prompt)

summarize_sync

summarize_sync(result: IssueExtractionResult, llm: LLMRunnable, prompt_template: Optional[str] = None, max_issues: int = 100) -> IssueSummary

Synchronous version of summarize().

Generates a complete LLM-powered summary of evaluation issues.

Parameters:

  • result (IssueExtractionResult) –

    The IssueExtractionResult to summarize

  • llm (LLMRunnable) –

    The LLM to use for generating the summary

  • prompt_template (Optional[str], default: None ) –

    Custom prompt template. If None, uses DEFAULT_SUMMARY_PROMPT.

  • max_issues (int, default: 100 ) –

    Maximum number of issues to include in the prompt

Returns:

  • IssueSummary

    IssueSummary containing the LLM's analysis.

Example
summary = extractor.summarize_sync(issues, llm=llm)
print(summary.text)

SignalAdapterRegistry

axion.reporting.issue_extractor.SignalAdapterRegistry

Registry for MetricSignalAdapter instances.

Provides a centralized way to register and retrieve adapters for different metrics. Users can register custom adapters for their own metrics using the decorator or direct registration methods.

Example using decorator
@SignalAdapterRegistry.register('my_custom_metric')
def my_adapter():
    return MetricSignalAdapter(
        metric_key='my_custom_metric',
        headline_signals=['passed'],
        issue_values={'passed': [False]},
        context_signals=['reason'],
    )
Example using direct registration
SignalAdapterRegistry.register_adapter(
    'my_custom_metric',
    MetricSignalAdapter(
        metric_key='my_custom_metric',
        headline_signals=['passed'],
        issue_values={'passed': [False]},
        context_signals=['reason'],
    )
)

register classmethod

register(metric_key: str)

Decorator to register a signal adapter for a metric.

The decorated function should return a MetricSignalAdapter instance.

Parameters:

  • metric_key (str) –

    The metric identifier (e.g., 'faithfulness', 'my_custom_metric')

Returns:

  • Decorator function

Example
@SignalAdapterRegistry.register('custom_metric')
def custom_adapter():
    return MetricSignalAdapter(
        metric_key='custom_metric',
        headline_signals=['is_valid'],
        issue_values={'is_valid': [False]},
        context_signals=['reason'],
    )

register_adapter classmethod

register_adapter(metric_key: str, adapter: MetricSignalAdapter) -> None

Directly register a MetricSignalAdapter for a metric.

Parameters:

  • metric_key (str) –

    The metric identifier

  • adapter (MetricSignalAdapter) –

    The MetricSignalAdapter instance

Example
SignalAdapterRegistry.register_adapter(
    'my_metric',
    MetricSignalAdapter(
        metric_key='my_metric',
        headline_signals=['score'],
        issue_values={'score': [0]},
        context_signals=['explanation'],
    )
)

get classmethod

get(metric_name: str) -> Optional[MetricSignalAdapter]

Get the adapter for a metric by name.

Parameters:

  • metric_name (str) –

    The metric name (case-insensitive, spaces/hyphens normalized)

Returns:

list_adapters classmethod

list_adapters() -> List[str]

List all registered adapter keys.

Returns:

  • List[str]

    List of registered metric keys.


Data Classes

ExtractedIssue

axion.reporting.issue_extractor.ExtractedIssue dataclass

ExtractedIssue(test_case_id: str, metric_name: str, signal_group: str, signal_name: str, value: Any, score: float, description: Optional[str] = None, reasoning: Optional[str] = None, item_context: Dict[str, Any] = dict(), source_path: str = '', raw_signal: Dict[str, Any] = dict())

Represents a single low-score signal extracted from metric evaluation results.

Attributes:

  • test_case_id (str) –

    Unique identifier for the test case

  • metric_name (str) –

    Name of the metric that produced this signal

  • signal_group (str) –

    Group name for the signal (e.g., "claim_0", "aspect_Coverage")

  • signal_name (str) –

    Name of the signal (e.g., "is_covered", "faithfulness_verdict")

  • value (Any) –

    Original value (False, "CONTRADICTORY", etc.)

  • score (float) –

    Numeric score (0.0 for failures)

  • description (Optional[str]) –

    Optional description of the signal

  • reasoning (Optional[str]) –

    LLM reasoning from sibling signal if available

  • item_context (Dict[str, Any]) –

    Context from the test case (query, actual_output, etc.)

  • source_path (str) –

    Path for debugging (e.g., "results[42].score_results[0].signals.claim_0")

  • raw_signal (Dict[str, Any]) –

    Original signal dict for debugging

IssueExtractionResult

axion.reporting.issue_extractor.IssueExtractionResult dataclass

IssueExtractionResult(run_id: str, evaluation_name: Optional[str], total_test_cases: int, total_signals_analyzed: int, issues_found: int, issues_by_metric: Dict[str, List[ExtractedIssue]], issues_by_type: Dict[str, List[ExtractedIssue]], all_issues: List[ExtractedIssue])

Aggregated result of issue extraction from an evaluation run.

Attributes:

  • run_id (str) –

    Unique identifier for the evaluation run

  • evaluation_name (Optional[str]) –

    Optional name of the evaluation

  • total_test_cases (int) –

    Total number of test cases analyzed

  • total_signals_analyzed (int) –

    Total number of signals analyzed

  • issues_found (int) –

    Total number of issues found

  • issues_by_metric (Dict[str, List[ExtractedIssue]]) –

    Issues grouped by metric name

  • issues_by_type (Dict[str, List[ExtractedIssue]]) –

    Issues grouped by signal name (issue type)

  • all_issues (List[ExtractedIssue]) –

    Flat list of all extracted issues

IssueGroup

axion.reporting.issue_extractor.IssueGroup dataclass

IssueGroup(metric_name: str, signal_name: str, total_count: int, unique_values: List[Any], representative_issues: List[ExtractedIssue], affected_test_cases: List[str], llm_summary: Optional[str] = None)

Represents a group of similar issues for summarization.

Attributes:

  • metric_name (str) –

    The metric that produced these issues

  • signal_name (str) –

    The signal name (e.g., "is_covered", "faithfulness_verdict")

  • total_count (int) –

    Total number of issues in this group

  • unique_values (List[Any]) –

    Set of unique failure values

  • representative_issues (List[ExtractedIssue]) –

    Sample issues with full context

  • affected_test_cases (List[str]) –

    List of affected test case IDs

  • llm_summary (Optional[str]) –

    Optional LLM-generated summary of the pattern

IssueSummary

axion.reporting.issue_extractor.IssueSummary dataclass

IssueSummary(text: str, prompt_used: str, issues_analyzed: int, evaluation_name: Optional[str] = None)

LLM-generated summary of evaluation issues.

Attributes:

  • text (str) –

    The full LLM-generated analysis and summary

  • prompt_used (str) –

    The prompt that was sent to the LLM

  • issues_analyzed (int) –

    Number of issues included in the analysis

  • evaluation_name (Optional[str]) –

    Name of the evaluation that was analyzed

LLMSummaryInput

axion.reporting.issue_extractor.LLMSummaryInput dataclass

LLMSummaryInput(evaluation_name: Optional[str], total_test_cases: int, issues_found: int, issues_by_metric: Dict[str, int], issues_by_type: Dict[str, int], detailed_issues: List[Dict[str, Any]])

Structured input for LLM-based issue summarization.

Attributes:

  • evaluation_name (Optional[str]) –

    Name of the evaluation

  • total_test_cases (int) –

    Total test cases analyzed

  • issues_found (int) –

    Total issues found

  • issues_by_metric (Dict[str, int]) –

    Summary counts by metric

  • issues_by_type (Dict[str, int]) –

    Summary counts by issue type

  • detailed_issues (List[Dict[str, Any]]) –

    List of detailed issue dicts for the prompt

MetricSignalAdapter

axion.reporting.issue_extractor.MetricSignalAdapter dataclass

MetricSignalAdapter(metric_key: str, headline_signals: List[str], issue_values: Dict[str, List[Any]], context_signals: List[str])

Adapter defining how to extract issues from a specific metric's signals.

Attributes:

  • metric_key (str) –

    Metric identifier (e.g., "faithfulness")

  • headline_signals (List[str]) –

    Signals that indicate pass/fail

  • issue_values (Dict[str, List[Any]]) –

    Mapping of signal names to failure values

  • context_signals (List[str]) –

    Sibling signals to include for context


Built-in Adapters

The following adapters are pre-registered:

Adapter Key Headline Signals Issue Values
faithfulness faithfulness_verdict CONTRADICTORY, NO_EVIDENCE
answer_criteria is_covered, concept_coverage False
answer_relevancy is_relevant, verdict False, no
answer_completeness is_covered, is_addressed False
factual_accuracy is_correct, accuracy_score False, 0
answer_conciseness conciseness_score (score-based)
contextual_relevancy is_relevant False
contextual_recall is_attributable, is_supported False
contextual_precision is_useful, map_score False
contextual_utilization is_utilized False
contextual_sufficiency is_sufficient False
contextual_ranking is_correctly_ranked False
citation_relevancy relevance_verdict False
pii_leakage pii_verdict yes
tone_style_consistency is_consistent False
persona_tone_adherence persona_match False
conversation_efficiency efficiency_score (score-based)
conversation_flow final_score (score-based)
goal_completion is_completed, goal_achieved False
citation_presence presence_check_passed False
latency latency_score (threshold-based)
tool_correctness all_tools_correct False