Levenshtein Ratio¶
Heuristic Single Turn Fast
At a Glance¶
Score Range
0.0 โโโโโโโโ 1.0Character-level similarity
Default Threshold
0.2Pass/fail cutoff
Required Inputs
actual_output expected_outputText comparison
What It Measures
Levenshtein Ratio calculates the character-level similarity between two strings using the SequenceMatcher algorithm. It measures how many edits (insertions, deletions, substitutions) are needed to transform one string into another.
| Score | Interpretation |
|---|---|
| 1.0 | Identical strings |
| 0.8+ | Very similar, minor typos |
| 0.5-0.8 | Moderate similarity |
| < 0.5 | Significant differences |
- Checking for typos or small variations
- Fuzzy string matching needed
- Comparing names or identifiers
- Near-match detection
- Semantic similarity matters
- Word-level comparison preferred (use BLEU)
- Long texts with different structures
- Exact match required
See Also: Sentence BLEU
Levenshtein Ratio measures character-level edit distance. Sentence BLEU measures word-level n-gram precision.
Use Levenshtein for typo detection; use BLEU for paraphrase comparison.
How It Works
Uses Python's SequenceMatcher to calculate the ratio of matching characters.
Step-by-Step Process¶
flowchart TD
subgraph INPUT["๐ฅ Inputs"]
A[Actual Output]
B[Expected Output]
end
subgraph PROCESS["๐ Processing"]
C[Optional: Convert to lowercase]
D[Find matching subsequences]
E[Calculate similarity ratio]
end
subgraph OUTPUT["๐ Result"]
F["Score: 0.0 to 1.0"]
end
A & B --> C
C --> D
D --> E
E --> F
style INPUT stroke:#f59e0b,stroke-width:2px
style PROCESS stroke:#3b82f6,stroke-width:2px
style OUTPUT stroke:#10b981,stroke-width:2px
Configuration¶
| Parameter | Type | Default | Description |
|---|---|---|---|
case_sensitive |
bool |
False |
Whether comparison is case-sensitive |
Case Sensitivity
By default, comparison is case-insensitive (both strings converted to lowercase). Set case_sensitive=True for strict character matching.
Code Examples¶
from axion.metrics import LevenshteinRatio
# Case insensitive (default)
metric = LevenshteinRatio(case_sensitive=False)
item = DatasetItem(
actual_output="HELLO",
expected_output="hello",
)
result = await metric.execute(item)
print(result.score) # 1.0 - case ignored
# Case sensitive
metric_strict = LevenshteinRatio(case_sensitive=True)
result_strict = await metric_strict.execute(item)
print(result_strict.score) # 0.0 - case matters
Example Scenarios¶
โ Scenario 1: High Similarity (Score: 0.95)
Minor Typo
Expected: "accommodation"
Actual: "accomodation" (missing 'm')
Result: ~0.92
Single character difference results in high similarity.
โ ๏ธ Scenario 2: Moderate Similarity (Score: 0.67)
Multiple Differences
Expected: "Hello World"
Actual: "Helo Wrld" (missing letters)
Result: ~0.67
Several missing characters reduce similarity.
โ Scenario 3: Low Similarity (Score: 0.2)
Very Different Strings
Expected: "The quick brown fox"
Actual: "A lazy dog sleeps"
Result: ~0.2
Completely different content results in low similarity.
โ Scenario 4: Case Handling
Case Insensitive Match
Expected: "OpenAI"
Actual: "openai"
Result (default): 1.0
Result (case_sensitive=True): ~0.67
Case sensitivity significantly affects scoring.
Why It Matters¶
No LLM calls needed. Instant, reproducible results.
Perfect for detecting spelling errors and near-matches.
Unlike binary metrics, provides nuanced similarity scores.
Quick Reference¶
TL;DR
Levenshtein Ratio = How similar are the strings at the character level?
- Use it when: Checking for typos or fuzzy matching
- Score interpretation: Higher = more similar characters
- Key config:
case_sensitivecontrols case handling
-
API Reference
-
Related Metrics
Sentence BLEU ยท Exact String Match ยท Contains Match