Skip to content

Citation Presence

Verify responses include properly formatted citations
Heuristic Knowledge Multi-Turn

At a Glance

๐ŸŽฏ
Score Range
0.0 or 1.0
Binary pass/fail
โšก
Default Threshold
0.5
Pass/fail cutoff
๐Ÿ“‹
Required Inputs
actual_output
Optional: conversation

What It Measures

Citation Presence evaluates whether AI responses include properly formatted citationsโ€”URLs, DOIs, or academic references. It supports both single-turn responses and multi-turn conversations.

Score Interpretation
1.0 Citations present (in at least one message)
0.0 No citations found
โœ… Use When
  • Requiring sourced responses
  • Building research assistants
  • Enforcing citation policies
  • Validating knowledge retrieval
โŒ Don't Use When
  • Citations aren't required
  • Checking citation accuracy (use Faithfulness)
  • Creative/generative tasks
  • Simple Q&A without sources

Citation Presence vs Faithfulness

Citation Presence checks: "Are citations included?" Faithfulness checks: "Is the content accurate to the source?"

Use Citation Presence for format compliance; use Faithfulness for content verification.


How It Works

The metric extracts citations using regex patterns and evaluates based on the configured mode.

Step-by-Step Process

flowchart TD
    subgraph INPUT["๐Ÿ“ฅ Input"]
        A[Response Text]
        B[Mode Setting]
    end

    subgraph EXTRACT["๐Ÿ” Step 1: Extract Citations"]
        C[Run citation patterns]
        D1["HTTP/HTTPS URLs"]
        D2["DOI references"]
        D3["Academic citations"]
    end

    subgraph EVALUATE["โš–๏ธ Step 2: Mode-Based Evaluation"]
        E{Mode?}
        F["any_citation: Any URL/DOI found?"]
        G["resource_section: Section with citations?"]
    end

    subgraph OUTPUT["๐Ÿ“Š Result"]
        H["1.0 = Pass"]
        I["0.0 = Fail"]
    end

    A & B --> C
    C --> D1 & D2 & D3
    D1 & D2 & D3 --> E
    E -->|any_citation| F
    E -->|resource_section| G
    F & G -->|Yes| H
    F & G -->|No| I

    style INPUT stroke:#f59e0b,stroke-width:2px
    style EXTRACT stroke:#3b82f6,stroke-width:2px
    style EVALUATE stroke:#8b5cf6,stroke-width:2px
    style OUTPUT stroke:#10b981,stroke-width:2px
Format Pattern Example
HTTP/HTTPS URLs https?://... https://docs.python.org/3/
WWW URLs www.domain.com www.wikipedia.org
DOI References doi:10.xxxx/... doi:10.1000/xyz123
Academic (Author, Year) (Smith et al., 2023)

any_citation (default)
Pass if any citation appears anywhere in the response.

resource_section
Pass only if citations appear in a dedicated Resources/References section.


Configuration

Parameter Type Default Description
mode str any_citation Evaluation mode: any_citation or resource_section
strict bool False If True, validates URLs are live
use_semantic_search bool False Use embeddings for fallback detection
embed_model EmbeddingRunnable None Embedding model (required if semantic search enabled)
resource_similarity_threshold float 0.8 Threshold for semantic matching
custom_resource_phrases List[str] None Custom phrases to identify resource sections

Strict Mode

When strict=True, the metric validates that URLs are live by making HEAD requests. This ensures citations point to actual resources but adds latency.


Code Examples

from axion.metrics import CitationPresence
from axion.dataset import DatasetItem

metric = CitationPresence()

item = DatasetItem(
    actual_output="Python is a programming language. Learn more at https://python.org",
)

result = await metric.execute(item)
print(result.score)  # 1.0 - URL citation found
from axion.metrics import CitationPresence

metric = CitationPresence()

item = DatasetItem(
    actual_output="Python is a great programming language for beginners.",
)

result = await metric.execute(item)
print(result.score)  # 0.0 - no citations
print(result.explanation)
# "Mode: any_citation. FAILURE: No assistant message satisfied the citation requirement."
from axion.metrics import CitationPresence

# Require citations in a dedicated section
metric = CitationPresence(mode='resource_section')

item = DatasetItem(
    actual_output="""
    Python is versatile and beginner-friendly.

    For More Information:
    - https://docs.python.org/3/
    - https://realpython.com/
    """,
)

result = await metric.execute(item)
print(result.score)  # 1.0 - resource section with citations
from axion.metrics import CitationPresence
from axion._core.schema import Conversation, HumanMessage, AIMessage

metric = CitationPresence()

item = DatasetItem(
    actual_output="",  # Will check conversation instead
    conversation=Conversation(messages=[
        HumanMessage(content="What is Python?"),
        AIMessage(content="Python is a programming language."),
        HumanMessage(content="Where can I learn more?"),
        AIMessage(content="Check out https://python.org and https://realpython.com"),
    ]),
)

result = await metric.execute(item)
print(result.score)  # 1.0 - citation in second AI message
print(result.signals.messages_with_citations)  # [3] (index of 2nd AI message)

Metric Diagnostics

Every evaluation is fully interpretable. Access detailed diagnostic results via result.signals.

result = await metric.execute(item)
print(result.pretty())      # Human-readable summary
result.signals              # Full diagnostic breakdown
๐Ÿ“Š CitationPresenceResult Structure
CitationPresenceResult(
{
    "passes_presence_check": True,
    "total_assistant_messages": 2,
    "messages_with_citations": [3]  # 0-indexed message positions
}
)

Signal Fields

Field Type Description
passes_presence_check bool Whether citation requirement was met
total_assistant_messages int Number of AI messages evaluated
messages_with_citations List[int] Indices of messages with valid citations

Example Scenarios

โœ… Scenario 1: URL Citation (Score: 1.0)

HTTP URL Found

Output:

"Machine learning is a subset of AI. See https://scikit-learn.org for tutorials."

Citations Detected: https://scikit-learn.org

Final Score: 1.0

โœ… Scenario 2: Academic Citation (Score: 1.0)

Author-Year Format

Output:

"Attention mechanisms transformed NLP (Vaswani et al., 2017)."

Citations Detected: (Vaswani et al., 2017)

Final Score: 1.0

โŒ Scenario 3: No Citations (Score: 0.0)

Missing Citations

Output:

"Deep learning uses neural networks with multiple layers to process data."

Citations Detected: None

Final Score: 0.0

โš ๏ธ Scenario 4: Resource Section Required

Wrong Mode

Mode: resource_section

Output:

"Python documentation is at https://python.org which explains everything."

Analysis: URL exists but not in a resource section.

Final Score: 0.0

Switch to any_citation mode or add a Resources section.


Why It Matters

๐Ÿ“š Source Attribution

Ensure AI outputs provide proper attribution to sources.

๐ŸŽ“ Research Quality

Enforce citation standards for academic or research applications.

โœ… Policy Compliance

Verify responses meet organizational citation requirements.


Quick Reference

TL;DR

Citation Presence = Does the response include citations?

  • Use it when: Requiring sourced responses or research assistants
  • Score interpretation: 1.0 = citations found, 0.0 = none
  • Key config: mode determines where citations must appear