Latency¶
Heuristic Single Turn Performance
At a Glance¶
Score Range
0.0 โโโโโโโโ โSeconds (or normalized 0-1)
Default Threshold
5.0sTarget latency
Required Inputs
latencyExecution time in seconds
What It Measures
The Latency metric evaluates execution time performance. It can return raw latency values or normalize them to a 0-1 scale using various decay functions.
| Mode | Score Interpretation |
|---|---|
| Raw | Actual latency in seconds (lower is better) |
| Normalized | 0.0-1.0 where 1.0 = instant, 0.0 = very slow |
- Monitoring response times
- SLA compliance checking
- Performance regression testing
- Comparing model latencies
- Quality metrics are more important
- Latency isn't being tracked
- Network conditions are highly variable
- Cold start effects dominate
Inverse Scoring
Unlike most metrics where higher is better, lower latency is better. The metric is marked as inverse_scoring_metric = True for proper aggregation.
How It Works
The metric reads the latency value and optionally normalizes it.
Step-by-Step Process¶
flowchart TD
subgraph INPUT["๐ฅ Input"]
A[Latency Value]
B[Threshold Setting]
end
subgraph PROCESS["๐ Processing"]
C{Normalize?}
D[Return raw latency]
E[Apply normalization function]
end
subgraph OUTPUT["๐ Result"]
F["Raw: seconds"]
G["Normalized: 0.0-1.0"]
end
A & B --> C
C -->|No| D
C -->|Yes| E
D --> F
E --> G
style INPUT stroke:#f59e0b,stroke-width:2px
style PROCESS stroke:#3b82f6,stroke-width:2px
style OUTPUT stroke:#10b981,stroke-width:2px
Four normalization methods convert raw latency to a 0-1 score:
| Method | Formula | Characteristics |
|---|---|---|
| exponential | exp(-latency/threshold) |
Smooth decay, never reaches 0 |
| sigmoid | 1/(1 + exp((latency-threshold)/scale)) |
S-curve centered at threshold |
| reciprocal | threshold/(threshold + latency) |
Hyperbolic decay |
| linear | max(0, 1 - latency/threshold) |
Linear drop to 0 |
Smooth decay. At threshold: ~0.37
S-curve. At threshold: 0.5
Hyperbolic. At threshold: 0.5
Simple. At threshold: 0.0
Configuration¶
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold |
float |
5.0 |
Target latency in seconds |
normalize |
bool |
False |
Whether to normalize to 0-1 range |
normalization_method |
str |
exponential |
Method: exponential, sigmoid, reciprocal, linear |
Choosing a Normalization Method
- exponential: Good default, smooth decay
- sigmoid: Hard cutoff around threshold
- reciprocal: Balanced decay, never hits 0
- linear: Simple, goes to 0 at threshold
Code Examples¶
from axion.metrics import Latency
from axion.dataset import DatasetItem
metric = Latency(threshold=2.0)
item = DatasetItem(
query="What is the capital of France?",
actual_output="Paris",
latency=1.5, # 1.5 seconds
)
result = await metric.execute(item)
print(result.score) # 1.5 (raw latency)
print(result.explanation)
# "Raw latency: 1.500s, below threshold (2.0s)."
from axion.metrics import Latency
# Exponential normalization
metric = Latency(
threshold=2.0,
normalize=True,
normalization_method='exponential'
)
item = DatasetItem(latency=1.0) # 1 second
result = await metric.execute(item)
print(f"{result.score:.3f}") # ~0.607 (exp(-1/2))
# Linear normalization
metric_linear = Latency(
threshold=2.0,
normalize=True,
normalization_method='linear'
)
result_linear = await metric_linear.execute(item)
print(f"{result_linear.score:.3f}") # 0.5 (1 - 1/2)
from axion.metrics import Latency
from axion.runners import MetricRunner
metric = Latency(threshold=3.0, normalize=True)
runner = MetricRunner(metrics=[metric])
results = await runner.run(dataset)
for item_result in results:
print(f"Latency score: {item_result.score:.2f}")
print(f" {item_result.explanation}")
Example Scenarios¶
โ Scenario 1: Excellent Performance
Below Half Threshold
Threshold: 5.0s
Latency: 2.0s (40% of threshold)
Raw Score: 2.0
Normalized (exponential): ~0.67
Explanation: "Latency: 2.000s. Normalized score: 0.670 (threshold: 5.0s, method: exponential). Performance: excellent."
โ ๏ธ Scenario 2: At Threshold
Exactly at Target
Threshold: 5.0s
Latency: 5.0s
Raw Score: 5.0
Normalized Scores:
| Method | Score |
|---|---|
| exponential | 0.37 |
| sigmoid | 0.50 |
| reciprocal | 0.50 |
| linear | 0.00 |
โ Scenario 3: Poor Performance
Above Threshold
Threshold: 2.0s
Latency: 8.0s (4x threshold)
Raw Score: 8.0
Normalized (exponential): ~0.02
Explanation: "Latency: 8.000s. Normalized score: 0.018 (threshold: 2.0s, method: exponential). Performance: poor."
Why It Matters¶
Track response times against service level agreements.
Detect regressions and optimize slow endpoints.
Balance model quality against response time requirements.
Quick Reference¶
TL;DR
Latency = How fast was the response?
- Use it when: Monitoring performance or SLA compliance
- Score interpretation: Raw (seconds) or normalized (0-1, higher = faster)
- Key config:
thresholdsets target,normalizeenables 0-1 scoring
-
API Reference
-
Related Concepts
MetricRunner ยท Evaluation Strategies