Issue Extractor Reference¶
API reference for the issue extraction and signal analysis module.
IssueExtractor
Core engine for extracting low-score signals from evaluation results into structured, actionable issues.
SignalAdapterRegistry
Registry for metric signal adapters. Maps metric keys to extraction rules for pass/fail signals.
MetricSignalAdapter
Adapter defining how to extract issues from a specific metric's signals — headline signals, failure values, and context.
Data Classes
Structured types for extracted issues, groups, summaries, and LLM input — the output of the extraction pipeline.
IssueExtractor¶
axion.reporting.issue_extractor.IssueExtractor ¶
IssueExtractor(score_threshold: float = 0.0, include_nan: bool = False, include_context_fields: Optional[List[str]] = None, metric_names: Optional[List[str]] = None, max_issues: Optional[int] = None, sample_rate: Optional[float] = None)
Extracts low-score signals from evaluation results for LLM-based issue summarization.
This class reads existing signal data from MetricScore objects and extracts issues (low-score signals) in a normalized format suitable for analysis.
Initialize the IssueExtractor.
Parameters:
-
score_threshold(float, default:0.0) –Signals with scores at or below this threshold are considered issues. Default 0.0 means only explicit failures.
-
include_nan(bool, default:False) –Whether to include signals with NaN scores as issues.
-
include_context_fields(Optional[List[str]], default:None) –Fields to include from test case context. Defaults to ['query', 'actual_output', 'expected_output'].
-
metric_names(Optional[List[str]], default:None) –Optional list of metric names to filter. If None, all metrics are processed.
-
max_issues(Optional[int], default:None) –Hard limit on number of issues to return.
-
sample_rate(Optional[float], default:None) –Deterministic sampling rate (0.0-1.0) by test_case_id.
extract_from_evaluation ¶
extract_from_evaluation(result: EvaluationResult) -> IssueExtractionResult
Extract all issues from an EvaluationResult.
Parameters:
-
result(EvaluationResult) –The EvaluationResult to analyze
Returns:
-
IssueExtractionResult–IssueExtractionResult with all extracted issues.
extract_from_test_result ¶
extract_from_test_result(test_result: TestResult, result_index: int) -> List[ExtractedIssue]
Extract issues from a single TestResult.
Parameters:
-
test_result(TestResult) –The TestResult to analyze
-
result_index(int) –Index in the results list
Returns:
-
List[ExtractedIssue]–List of ExtractedIssue objects found in this TestResult.
extract_from_metric_score ¶
extract_from_metric_score(metric_score: MetricScore, test_case_id: str, test_case: Any, result_index: int, score_index: int) -> List[ExtractedIssue]
Extract issues from a single MetricScore.
Parameters:
-
metric_score(MetricScore) –The MetricScore to analyze
-
test_case_id(str) –ID of the test case
-
test_case(Any) –The test case object for context
-
result_index(int) –Index in the results list
-
score_index(int) –Index in the score_results list
Returns:
-
List[ExtractedIssue]–List of ExtractedIssue objects found in this MetricScore.
to_llm_input ¶
to_llm_input(result: IssueExtractionResult, max_issues: int = 50) -> LLMSummaryInput
Convert extraction result to structured LLM input.
Parameters:
-
result(IssueExtractionResult) –The IssueExtractionResult to convert
-
max_issues(int, default:50) –Maximum number of detailed issues to include
Returns:
-
LLMSummaryInput–LLMSummaryInput suitable for LLM processing.
to_prompt_text ¶
to_prompt_text(result: IssueExtractionResult, max_issues: int = 50) -> str
Generate a text prompt for LLM-based issue summarization.
Parameters:
-
result(IssueExtractionResult) –The IssueExtractionResult to convert
-
max_issues(int, default:50) –Maximum number of detailed issues to include
Returns:
-
str–Formatted prompt text.
to_grouped_prompt_text ¶
to_grouped_prompt_text(result: IssueExtractionResult, llm: Optional[LLMRunnable] = None, max_groups: int = 20, max_examples_per_group: int = 2) -> str
Generate a grouped prompt with optional LLM summarization.
Groups similar issues together and shows representative examples, reducing context size while preserving signal quality.
Parameters:
-
result(IssueExtractionResult) –The IssueExtractionResult to convert
-
llm(Optional[LLMRunnable], default:None) –Optional LLM for generating group summaries
-
max_groups(int, default:20) –Maximum number of issue groups to include
-
max_examples_per_group(int, default:2) –Representative examples per group
Returns:
-
str–Formatted prompt text with grouped issues.
to_grouped_prompt_text_async
async
¶
to_grouped_prompt_text_async(result: IssueExtractionResult, llm: Optional[LLMRunnable] = None, max_groups: int = 20, max_examples_per_group: int = 2) -> str
Generate a grouped prompt with optional LLM summarization (async version).
Groups similar issues together and shows representative examples, reducing context size while preserving signal quality.
Parameters:
-
result(IssueExtractionResult) –The IssueExtractionResult to convert
-
llm(Optional[LLMRunnable], default:None) –Optional LLM for generating group summaries
-
max_groups(int, default:20) –Maximum number of issue groups to include
-
max_examples_per_group(int, default:2) –Representative examples per group
Returns:
-
str–Formatted prompt text with grouped issues.
summarize
async
¶
summarize(result: IssueExtractionResult, llm: LLMRunnable, prompt_template: Optional[str] = None, max_issues: int = 100) -> IssueSummary
Generate a complete LLM-powered summary of evaluation issues.
This method sends the issues to an LLM and returns a structured summary including failure categories, root causes, and recommendations.
Parameters:
-
result(IssueExtractionResult) –The IssueExtractionResult to summarize
-
llm(LLMRunnable) –The LLM to use for generating the summary (must have acomplete method)
-
prompt_template(Optional[str], default:None) –Custom prompt template. If None, uses DEFAULT_SUMMARY_PROMPT. The template should include {overview} and {issue_data} placeholders.
-
max_issues(int, default:100) –Maximum number of issues to include in the prompt (default 100)
Returns:
-
IssueSummary–IssueSummary containing the LLM's analysis.
Example
from axion.reporting import IssueExtractor
from axion.llm_registry import LLMRegistry
extractor = IssueExtractor()
issues = extractor.extract_from_evaluation(eval_result)
reg = LLMRegistry('anthropic')
llm = reg.get_llm('claude-sonnet-4-20250514')
summary = await extractor.summarize(issues, llm=llm)
print(summary.text)
summarize_sync ¶
summarize_sync(result: IssueExtractionResult, llm: LLMRunnable, prompt_template: Optional[str] = None, max_issues: int = 100) -> IssueSummary
Synchronous version of summarize().
Generates a complete LLM-powered summary of evaluation issues.
Parameters:
-
result(IssueExtractionResult) –The IssueExtractionResult to summarize
-
llm(LLMRunnable) –The LLM to use for generating the summary
-
prompt_template(Optional[str], default:None) –Custom prompt template. If None, uses DEFAULT_SUMMARY_PROMPT.
-
max_issues(int, default:100) –Maximum number of issues to include in the prompt
Returns:
-
IssueSummary–IssueSummary containing the LLM's analysis.
SignalAdapterRegistry¶
axion.reporting.issue_extractor.SignalAdapterRegistry ¶
Registry for MetricSignalAdapter instances.
Provides a centralized way to register and retrieve adapters for different metrics. Users can register custom adapters for their own metrics using the decorator or direct registration methods.
Example using decorator
Example using direct registration
register
classmethod
¶
Decorator to register a signal adapter for a metric.
The decorated function should return a MetricSignalAdapter instance.
Parameters:
-
metric_key(str) –The metric identifier (e.g., 'faithfulness', 'my_custom_metric')
Returns:
-
–
Decorator function
register_adapter
classmethod
¶
register_adapter(metric_key: str, adapter: MetricSignalAdapter) -> None
Directly register a MetricSignalAdapter for a metric.
Parameters:
-
metric_key(str) –The metric identifier
-
adapter(MetricSignalAdapter) –The MetricSignalAdapter instance
get
classmethod
¶
get(metric_name: str) -> Optional[MetricSignalAdapter]
Get the adapter for a metric by name.
Parameters:
-
metric_name(str) –The metric name (case-insensitive, spaces/hyphens normalized)
Returns:
-
Optional[MetricSignalAdapter]–MetricSignalAdapter if found, None otherwise.
list_adapters
classmethod
¶
List all registered adapter keys.
Returns:
-
List[str]–List of registered metric keys.
Data Classes¶
ExtractedIssue¶
axion.reporting.issue_extractor.ExtractedIssue
dataclass
¶
ExtractedIssue(test_case_id: str, metric_name: str, signal_group: str, signal_name: str, value: Any, score: float, description: Optional[str] = None, reasoning: Optional[str] = None, item_context: Dict[str, Any] = dict(), source_path: str = '', raw_signal: Dict[str, Any] = dict())
Represents a single low-score signal extracted from metric evaluation results.
Attributes:
-
test_case_id(str) –Unique identifier for the test case
-
metric_name(str) –Name of the metric that produced this signal
-
signal_group(str) –Group name for the signal (e.g., "claim_0", "aspect_Coverage")
-
signal_name(str) –Name of the signal (e.g., "is_covered", "faithfulness_verdict")
-
value(Any) –Original value (False, "CONTRADICTORY", etc.)
-
score(float) –Numeric score (0.0 for failures)
-
description(Optional[str]) –Optional description of the signal
-
reasoning(Optional[str]) –LLM reasoning from sibling signal if available
-
item_context(Dict[str, Any]) –Context from the test case (query, actual_output, etc.)
-
source_path(str) –Path for debugging (e.g., "results[42].score_results[0].signals.claim_0")
-
raw_signal(Dict[str, Any]) –Original signal dict for debugging
IssueExtractionResult¶
axion.reporting.issue_extractor.IssueExtractionResult
dataclass
¶
IssueExtractionResult(run_id: str, evaluation_name: Optional[str], total_test_cases: int, total_signals_analyzed: int, issues_found: int, issues_by_metric: Dict[str, List[ExtractedIssue]], issues_by_type: Dict[str, List[ExtractedIssue]], all_issues: List[ExtractedIssue])
Aggregated result of issue extraction from an evaluation run.
Attributes:
-
run_id(str) –Unique identifier for the evaluation run
-
evaluation_name(Optional[str]) –Optional name of the evaluation
-
total_test_cases(int) –Total number of test cases analyzed
-
total_signals_analyzed(int) –Total number of signals analyzed
-
issues_found(int) –Total number of issues found
-
issues_by_metric(Dict[str, List[ExtractedIssue]]) –Issues grouped by metric name
-
issues_by_type(Dict[str, List[ExtractedIssue]]) –Issues grouped by signal name (issue type)
-
all_issues(List[ExtractedIssue]) –Flat list of all extracted issues
IssueGroup¶
axion.reporting.issue_extractor.IssueGroup
dataclass
¶
IssueGroup(metric_name: str, signal_name: str, total_count: int, unique_values: List[Any], representative_issues: List[ExtractedIssue], affected_test_cases: List[str], llm_summary: Optional[str] = None)
Represents a group of similar issues for summarization.
Attributes:
-
metric_name(str) –The metric that produced these issues
-
signal_name(str) –The signal name (e.g., "is_covered", "faithfulness_verdict")
-
total_count(int) –Total number of issues in this group
-
unique_values(List[Any]) –Set of unique failure values
-
representative_issues(List[ExtractedIssue]) –Sample issues with full context
-
affected_test_cases(List[str]) –List of affected test case IDs
-
llm_summary(Optional[str]) –Optional LLM-generated summary of the pattern
IssueSummary¶
axion.reporting.issue_extractor.IssueSummary
dataclass
¶
IssueSummary(text: str, prompt_used: str, issues_analyzed: int, evaluation_name: Optional[str] = None)
LLM-generated summary of evaluation issues.
Attributes:
-
text(str) –The full LLM-generated analysis and summary
-
prompt_used(str) –The prompt that was sent to the LLM
-
issues_analyzed(int) –Number of issues included in the analysis
-
evaluation_name(Optional[str]) –Name of the evaluation that was analyzed
LLMSummaryInput¶
axion.reporting.issue_extractor.LLMSummaryInput
dataclass
¶
LLMSummaryInput(evaluation_name: Optional[str], total_test_cases: int, issues_found: int, issues_by_metric: Dict[str, int], issues_by_type: Dict[str, int], detailed_issues: List[Dict[str, Any]])
Structured input for LLM-based issue summarization.
Attributes:
-
evaluation_name(Optional[str]) –Name of the evaluation
-
total_test_cases(int) –Total test cases analyzed
-
issues_found(int) –Total issues found
-
issues_by_metric(Dict[str, int]) –Summary counts by metric
-
issues_by_type(Dict[str, int]) –Summary counts by issue type
-
detailed_issues(List[Dict[str, Any]]) –List of detailed issue dicts for the prompt
MetricSignalAdapter¶
axion.reporting.issue_extractor.MetricSignalAdapter
dataclass
¶
MetricSignalAdapter(metric_key: str, headline_signals: List[str], issue_values: Dict[str, List[Any]], context_signals: List[str])
Adapter defining how to extract issues from a specific metric's signals.
Attributes:
-
metric_key(str) –Metric identifier (e.g., "faithfulness")
-
headline_signals(List[str]) –Signals that indicate pass/fail
-
issue_values(Dict[str, List[Any]]) –Mapping of signal names to failure values
-
context_signals(List[str]) –Sibling signals to include for context
Built-in Adapters¶
The following adapters are pre-registered:
| Adapter Key | Headline Signals | Issue Values |
|---|---|---|
faithfulness |
faithfulness_verdict |
CONTRADICTORY, NO_EVIDENCE |
answer_criteria |
is_covered, concept_coverage |
False |
answer_relevancy |
is_relevant, verdict |
False, no |
answer_completeness |
is_covered, is_addressed |
False |
factual_accuracy |
is_correct, accuracy_score |
False, 0 |
answer_conciseness |
conciseness_score |
(score-based) |
contextual_relevancy |
is_relevant |
False |
contextual_recall |
is_attributable, is_supported |
False |
contextual_precision |
is_useful, map_score |
False |
contextual_utilization |
is_utilized |
False |
contextual_sufficiency |
is_sufficient |
False |
contextual_ranking |
is_correctly_ranked |
False |
citation_relevancy |
relevance_verdict |
False |
pii_leakage |
pii_verdict |
yes |
tone_style_consistency |
is_consistent |
False |
persona_tone_adherence |
persona_match |
False |
conversation_efficiency |
efficiency_score |
(score-based) |
conversation_flow |
final_score |
(score-based) |
goal_completion |
is_completed, goal_achieved |
False |
citation_presence |
presence_check_passed |
False |
latency |
latency_score |
(threshold-based) |
tool_correctness |
all_tools_correct |
False |