Skip to content

Contextual Sufficiency

Evaluate if retrieved context contains enough information to answer the query
LLM-Powered Knowledge Single Turn Retrieval

At a Glance

🎯
Score Range
0.0 or 1.0
Binary sufficiency verdict
⚡
Default Threshold
0.5
Pass/fail cutoff
đź“‹
Required Inputs
query retrieved_content
No answer required

What It Measures

Contextual Sufficiency evaluates whether the retrieved context contains enough information to fully answer the user's query. Unlike other metrics that measure partial coverage, this is a binary judgment: either the context is sufficient or it isn't.

Score Interpretation
1.0 Context is sufficient to answer the query
0.0 Context is insufficient—information missing
âś… Use When
  • Diagnosing retrieval quality
  • Testing retrieval before generation
  • Identifying information gaps
  • Deciding when to expand search
❌ Don't Use When
  • Need granular coverage scores
  • Evaluating answer quality
  • Comparing retrieval strategies
  • Need partial credit

RAG Evaluation Suite

Contextual Sufficiency asks: "Is there enough context to answer this question?"

Related retrieval metrics:


How It Works

The metric uses an LLM to make a binary judgment about context sufficiency.

Step-by-Step Process

flowchart TD
    subgraph INPUT["📥 Inputs"]
        A[Query]
        B[Retrieved Context]
    end

    subgraph JUDGE["⚖️ Sufficiency Judgment"]
        C[RAGAnalyzer Engine]
        D["Can this context answer the query?"]
        E["Binary Verdict"]
    end

    subgraph OUTPUT["📊 Result"]
        F["1.0 = Sufficient"]
        G["0.0 = Insufficient"]
        H["Reasoning Provided"]
    end

    A & B --> C
    C --> D
    D --> E
    E --> F & G
    F & G --> H

    style INPUT stroke:#1E3A5F,stroke-width:2px
    style JUDGE stroke:#f59e0b,stroke-width:2px
    style OUTPUT stroke:#10b981,stroke-width:2px
    style E fill:#1E3A5F,stroke:#0F2440,stroke-width:3px,color:#fff

A single binary verdict for the entire context.

âś… SUFFICIENT
1.0

Context contains all necessary information to answer the query completely.

❌ INSUFFICIENT
0.0

Context is missing critical information needed to answer the query.

Diagnostic Purpose

This metric helps diagnose retrieval issues independent of generation. If sufficiency is low but faithfulness is high, your retriever needs improvement.


Configuration

Parameter Type Default Description
mode EvaluationMode GRANULAR Evaluation detail level

Binary by Design

Unlike other metrics that provide granular scores, Sufficiency is intentionally binary. For partial coverage scores, use Contextual Recall or Contextual Relevancy.


Code Examples

from axion.metrics import ContextualSufficiency
from axion.dataset import DatasetItem

metric = ContextualSufficiency()

item = DatasetItem(
    query="What is the boiling point of water?",
    retrieved_content=[
        "Water boils at 100 degrees Celsius at sea level.",
        "This is equivalent to 212 degrees Fahrenheit.",
    ],
)

result = await metric.execute(item)
print(result.pretty())
# Score: 1.0 (context is sufficient)
from axion.metrics import ContextualSufficiency

metric = ContextualSufficiency()

item = DatasetItem(
    query="What is the boiling point of water at high altitude?",
    retrieved_content=[
        "Water boils at 100 degrees Celsius at sea level.",
    ],
)

result = await metric.execute(item)
# Score: 0.0 (missing altitude information)
print(result.signals.reasoning)
# "Context only mentions sea level; no information about altitude effects."
from axion.metrics import ContextualSufficiency
from axion.runners import MetricRunner

metric = ContextualSufficiency()
runner = MetricRunner(metrics=[metric])
results = await runner.run(dataset)

sufficient_count = sum(1 for r in results if r.score == 1.0)
print(f"Sufficient: {sufficient_count}/{len(results)}")

for item_result in results:
    if item_result.score == 0.0:
        print(f"⚠️ Insufficient for: {item_result.signals.query[:50]}...")
        print(f"   Reason: {item_result.signals.reasoning}")

Metric Diagnostics

Every evaluation is fully interpretable. Access detailed diagnostic results via result.signals to understand exactly why a score was given—no black boxes.

result = await metric.execute(item)
print(result.pretty())      # Human-readable summary
result.signals              # Full diagnostic breakdown
📊 ContextualSufficiencyResult Structure
ContextualSufficiencyResult(
{
    "sufficiency_score": 1.0,
    "is_sufficient": true,
    "reasoning": "The context fully addresses the query by providing the boiling point of water (100°C) and its Fahrenheit equivalent (212°F).",
    "query": "What is the boiling point of water?",
    "context": "Water boils at 100 degrees Celsius at sea level. This is equivalent to 212 degrees Fahrenheit."
}
)

Signal Fields

Field Type Description
sufficiency_score float Binary score (1.0 or 0.0)
is_sufficient bool Whether context is sufficient
reasoning str Explanation for the verdict
query str The user query (preview)
context str The retrieved context (preview)

Example Scenarios

âś… Scenario 1: Sufficient Context (Score: 1.0)

Complete Information

Query:

"Who invented the telephone and when?"

Retrieved Context:

"Alexander Graham Bell invented the telephone in 1876. He was granted the patent on March 7th of that year."

Analysis:

  • âś… Inventor identified: Alexander Graham Bell
  • âś… Year provided: 1876
  • âś… Additional detail: Patent date

Verdict: Sufficient

Reasoning: "The context directly answers both parts of the query—who (Alexander Graham Bell) and when (1876)."

Final Score: 1.0

❌ Scenario 2: Insufficient - Missing Key Info (Score: 0.0)

Critical Information Missing

Query:

"What are the side effects of aspirin?"

Retrieved Context:

"Aspirin is a common pain reliever. It belongs to a class of drugs called NSAIDs. It can be purchased over the counter."

Analysis:

  • âś… Drug identification: Correct
  • âś… Drug class: NSAIDs
  • ❌ Side effects: Not mentioned

Verdict: Insufficient

Reasoning: "The context describes what aspirin is but does not mention any side effects, which is the core of the query."

Final Score: 0.0

❌ Scenario 3: Insufficient - Partial Answer (Score: 0.0)

Incomplete Coverage

Query:

"Compare the populations of Tokyo and New York City."

Retrieved Context:

"Tokyo is the capital of Japan with a metropolitan population of over 37 million people, making it the world's most populous metropolitan area."

Analysis:

  • âś… Tokyo population: Provided
  • ❌ NYC population: Missing
  • ❌ Comparison: Cannot be made

Verdict: Insufficient

Reasoning: "Context only provides Tokyo's population. NYC population is missing, making a comparison impossible."

Final Score: 0.0


Why It Matters

🔍 Retrieval Diagnosis

Quickly identify if poor answers stem from insufficient retrieval, not generation quality.

🔄 Adaptive Search

Use as a signal to expand search or trigger alternative retrieval strategies.

⚡ Pre-Generation Check

Evaluate context before generating—don't waste tokens on insufficient information.


Quick Reference

TL;DR

Contextual Sufficiency = Is there enough context to fully answer the query?

  • Use it when: Diagnosing retrieval gaps or deciding to expand search
  • Score interpretation: 1.0 = sufficient, 0.0 = insufficient (binary)
  • Key insight: Identifies "missing information" problems in retrieval