Latency¶

Measure and evaluate execution time performance
Heuristic Single Turn Performance

At a Glance¶

🎯
Score Range
0.0 ──────── ∞
Seconds (or normalized 0-1)

⚡
Default Threshold
5.0s
Target latency

📋
Required Inputs
latency
Execution time in seconds

What It Measures

The Latency metric evaluates execution time performance. It can return raw latency values or normalize them to a 0-1 scale using various decay functions.

Mode	Score Interpretation
Raw	Actual latency in seconds (lower is better)
Normalized	0.0-1.0 where 1.0 = instant, 0.0 = very slow

✅ Use When

Monitoring response times
SLA compliance checking
Performance regression testing
Comparing model latencies

❌ Don't Use When

Quality metrics are more important
Latency isn't being tracked
Network conditions are highly variable
Cold start effects dominate

Inverse Scoring

Unlike most metrics where higher is better, lower latency is better. The metric is marked as inverse_scoring_metric = True for proper aggregation.

How It Works

Computation Normalization Methods

The metric reads the latency value and optionally normalizes it.

Step-by-Step Process¶

flowchart TD
    subgraph INPUT["📥 Input"]
        A[Latency Value]
        B[Threshold Setting]
    end

    subgraph PROCESS["🔍 Processing"]
        C{Normalize?}
        D[Return raw latency]
        E[Apply normalization function]
    end

    subgraph OUTPUT["📊 Result"]
        F["Raw: seconds"]
        G["Normalized: 0.0-1.0"]
    end

    A & B --> C
    C -->|No| D
    C -->|Yes| E
    D --> F
    E --> G

    style INPUT stroke:#f59e0b,stroke-width:2px
    style PROCESS stroke:#3b82f6,stroke-width:2px
    style OUTPUT stroke:#10b981,stroke-width:2px

Four normalization methods convert raw latency to a 0-1 score:

Method	Formula	Characteristics
exponential	`exp(-latency/threshold)`	Smooth decay, never reaches 0
sigmoid	`1/(1 + exp((latency-threshold)/scale))`	S-curve centered at threshold
reciprocal	`threshold/(threshold + latency)`	Hyperbolic decay
linear	`max(0, 1 - latency/threshold)`	Linear drop to 0

📈 Exponential
Smooth decay. At threshold: ~0.37

📉 Sigmoid
S-curve. At threshold: 0.5

📊 Reciprocal
Hyperbolic. At threshold: 0.5

📐 Linear
Simple. At threshold: 0.0

Configuration¶

Parameters

Parameter	Type	Default	Description
`threshold`	`float`	`5.0`	Target latency in seconds
`normalize`	`bool`	`False`	Whether to normalize to 0-1 range
`normalization_method`	`str`	`exponential`	Method: exponential, sigmoid, reciprocal, linear

Choosing a Normalization Method

exponential: Good default, smooth decay
sigmoid: Hard cutoff around threshold
reciprocal: Balanced decay, never hits 0
linear: Simple, goes to 0 at threshold

Code Examples¶

Basic Usage (Raw) Normalized Scoring With Runner

from axion.metrics import Latency
from axion.dataset import DatasetItem

metric = Latency(threshold=2.0)

item = DatasetItem(
    query="What is the capital of France?",
    actual_output="Paris",
    latency=1.5,  # 1.5 seconds
)

result = await metric.execute(item)
print(result.score)  # 1.5 (raw latency)
print(result.explanation)
# "Raw latency: 1.500s, below threshold (2.0s)."

from axion.metrics import Latency

# Exponential normalization
metric = Latency(
    threshold=2.0,
    normalize=True,
    normalization_method='exponential'
)

item = DatasetItem(latency=1.0)  # 1 second
result = await metric.execute(item)
print(f"{result.score:.3f}")  # ~0.607 (exp(-1/2))

# Linear normalization
metric_linear = Latency(
    threshold=2.0,
    normalize=True,
    normalization_method='linear'
)
result_linear = await metric_linear.execute(item)
print(f"{result_linear.score:.3f}")  # 0.5 (1 - 1/2)

from axion.metrics import Latency
from axion.runners import MetricRunner

metric = Latency(threshold=3.0, normalize=True)
runner = MetricRunner(metrics=[metric])
results = await runner.run(dataset)

for item_result in results:
    print(f"Latency score: {item_result.score:.2f}")
    print(f"  {item_result.explanation}")

Example Scenarios¶

✅ Scenario 1: Excellent Performance

Below Half Threshold

Threshold: 5.0s

Latency: 2.0s (40% of threshold)

Raw Score: 2.0

Normalized (exponential): ~0.67

Explanation: "Latency: 2.000s. Normalized score: 0.670 (threshold: 5.0s, method: exponential). Performance: excellent."

⚠️ Scenario 2: At Threshold

Exactly at Target

Threshold: 5.0s

Latency: 5.0s

Raw Score: 5.0

Normalized Scores:

Method	Score
exponential	0.37
sigmoid	0.50
reciprocal	0.50
linear	0.00

❌ Scenario 3: Poor Performance

Above Threshold

Threshold: 2.0s

Latency: 8.0s (4x threshold)

Raw Score: 8.0

Normalized (exponential): ~0.02

Explanation: "Latency: 8.000s. Normalized score: 0.018 (threshold: 2.0s, method: exponential). Performance: poor."

Why It Matters¶

⏱️ SLA Compliance

Track response times against service level agreements.

📈 Performance Monitoring

Detect regressions and optimize slow endpoints.

⚖️ Quality vs Speed

Balance model quality against response time requirements.

Quick Reference¶

TL;DR

Latency = How fast was the response?

Use it when: Monitoring performance or SLA compliance
Score interpretation: Raw (seconds) or normalized (0-1, higher = faster)
Key config: threshold sets target, normalize enables 0-1 scoring

API Reference

axion.metrics.Latency
Related Concepts

MetricRunner · Evaluation Strategies