Skip to content

Latency

Measure and evaluate execution time performance
Heuristic Single Turn Performance

At a Glance

๐ŸŽฏ
Score Range
0.0 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โˆž
Seconds (or normalized 0-1)
โšก
Default Threshold
5.0s
Target latency
๐Ÿ“‹
Required Inputs
latency
Execution time in seconds

What It Measures

The Latency metric evaluates execution time performance. It can return raw latency values or normalize them to a 0-1 scale using various decay functions.

Mode Score Interpretation
Raw Actual latency in seconds (lower is better)
Normalized 0.0-1.0 where 1.0 = instant, 0.0 = very slow
โœ… Use When
  • Monitoring response times
  • SLA compliance checking
  • Performance regression testing
  • Comparing model latencies
โŒ Don't Use When
  • Quality metrics are more important
  • Latency isn't being tracked
  • Network conditions are highly variable
  • Cold start effects dominate

Inverse Scoring

Unlike most metrics where higher is better, lower latency is better. The metric is marked as inverse_scoring_metric = True for proper aggregation.


How It Works

The metric reads the latency value and optionally normalizes it.

Step-by-Step Process

flowchart TD
    subgraph INPUT["๐Ÿ“ฅ Input"]
        A[Latency Value]
        B[Threshold Setting]
    end

    subgraph PROCESS["๐Ÿ” Processing"]
        C{Normalize?}
        D[Return raw latency]
        E[Apply normalization function]
    end

    subgraph OUTPUT["๐Ÿ“Š Result"]
        F["Raw: seconds"]
        G["Normalized: 0.0-1.0"]
    end

    A & B --> C
    C -->|No| D
    C -->|Yes| E
    D --> F
    E --> G

    style INPUT stroke:#f59e0b,stroke-width:2px
    style PROCESS stroke:#3b82f6,stroke-width:2px
    style OUTPUT stroke:#10b981,stroke-width:2px

Four normalization methods convert raw latency to a 0-1 score:

Method Formula Characteristics
exponential exp(-latency/threshold) Smooth decay, never reaches 0
sigmoid 1/(1 + exp((latency-threshold)/scale)) S-curve centered at threshold
reciprocal threshold/(threshold + latency) Hyperbolic decay
linear max(0, 1 - latency/threshold) Linear drop to 0

๐Ÿ“ˆ Exponential
Smooth decay. At threshold: ~0.37

๐Ÿ“‰ Sigmoid
S-curve. At threshold: 0.5

๐Ÿ“Š Reciprocal
Hyperbolic. At threshold: 0.5

๐Ÿ“ Linear
Simple. At threshold: 0.0


Configuration

Parameter Type Default Description
threshold float 5.0 Target latency in seconds
normalize bool False Whether to normalize to 0-1 range
normalization_method str exponential Method: exponential, sigmoid, reciprocal, linear

Choosing a Normalization Method

  • exponential: Good default, smooth decay
  • sigmoid: Hard cutoff around threshold
  • reciprocal: Balanced decay, never hits 0
  • linear: Simple, goes to 0 at threshold

Code Examples

from axion.metrics import Latency
from axion.dataset import DatasetItem

metric = Latency(threshold=2.0)

item = DatasetItem(
    query="What is the capital of France?",
    actual_output="Paris",
    latency=1.5,  # 1.5 seconds
)

result = await metric.execute(item)
print(result.score)  # 1.5 (raw latency)
print(result.explanation)
# "Raw latency: 1.500s, below threshold (2.0s)."
from axion.metrics import Latency

# Exponential normalization
metric = Latency(
    threshold=2.0,
    normalize=True,
    normalization_method='exponential'
)

item = DatasetItem(latency=1.0)  # 1 second
result = await metric.execute(item)
print(f"{result.score:.3f}")  # ~0.607 (exp(-1/2))

# Linear normalization
metric_linear = Latency(
    threshold=2.0,
    normalize=True,
    normalization_method='linear'
)
result_linear = await metric_linear.execute(item)
print(f"{result_linear.score:.3f}")  # 0.5 (1 - 1/2)
from axion.metrics import Latency
from axion.runners import MetricRunner

metric = Latency(threshold=3.0, normalize=True)
runner = MetricRunner(metrics=[metric])
results = await runner.run(dataset)

for item_result in results:
    print(f"Latency score: {item_result.score:.2f}")
    print(f"  {item_result.explanation}")

Example Scenarios

โœ… Scenario 1: Excellent Performance

Below Half Threshold

Threshold: 5.0s

Latency: 2.0s (40% of threshold)

Raw Score: 2.0

Normalized (exponential): ~0.67

Explanation: "Latency: 2.000s. Normalized score: 0.670 (threshold: 5.0s, method: exponential). Performance: excellent."

โš ๏ธ Scenario 2: At Threshold

Exactly at Target

Threshold: 5.0s

Latency: 5.0s

Raw Score: 5.0

Normalized Scores:

Method Score
exponential 0.37
sigmoid 0.50
reciprocal 0.50
linear 0.00
โŒ Scenario 3: Poor Performance

Above Threshold

Threshold: 2.0s

Latency: 8.0s (4x threshold)

Raw Score: 8.0

Normalized (exponential): ~0.02

Explanation: "Latency: 8.000s. Normalized score: 0.018 (threshold: 2.0s, method: exponential). Performance: poor."


Why It Matters

โฑ๏ธ SLA Compliance

Track response times against service level agreements.

๐Ÿ“ˆ Performance Monitoring

Detect regressions and optimize slow endpoints.

โš–๏ธ Quality vs Speed

Balance model quality against response time requirements.


Quick Reference

TL;DR

Latency = How fast was the response?

  • Use it when: Monitoring performance or SLA compliance
  • Score interpretation: Raw (seconds) or normalized (0-1, higher = faster)
  • Key config: threshold sets target, normalize enables 0-1 scoring