Tool Metrics¶
Evaluate AI agent tool calling correctness and effectiveness
1 Metric Agent
1 Metric Agent
Tool metrics evaluate the correctness and effectiveness of tool usage in AI agent workflows. These metrics assess whether agents correctly invoke the right tools with appropriate parameters.
Available Metrics¶
Quick Reference¶
| Metric | Score Range | Threshold | Key Question |
|---|---|---|---|
| Tool Correctness | 0.0 โ 1.0 | 0.5 | Were the right tools called correctly? |
Usage Example¶
from axion.metrics import ToolCorrectness
from axion.runners import MetricRunner
from axion.dataset import DatasetItem
from axion._core.schema import ToolCall
# Create evaluation item
item = DatasetItem(
tools_called=[
ToolCall(name="search", args={"query": "weather in Paris"}),
ToolCall(name="format", args={"style": "brief"}),
],
expected_tools=[
ToolCall(name="search", args={"query": "weather in Paris"}),
ToolCall(name="format", args={"style": "brief"}),
],
)
# Initialize metric
metric = ToolCorrectness(
check_parameters=True,
parameter_matching_strategy='exact'
)
# Run evaluation
runner = MetricRunner(metrics=[metric])
results = await runner.run([item])
print(f"Tool Correctness: {results[0].score:.2f}")
# Output: Tool Correctness: 1.00
Evaluation Modes¶
Tool Correctness supports multiple evaluation strategies:
Name Only (Default)
Just verify the correct tools were called. Parameters are ignored.
With Parameters
Validate both tool names and their arguments.
Strict Order
Tools must be called in the exact expected sequence.
Why Tool Metrics?¶
๐ค
Agent Evaluation
Verify AI agents select the right tools for tasks.
๐ง
Function Calling
Test LLM function calling capabilities.
๐
Workflow Validation
Ensure multi-step workflows execute correctly.
๐งช
Regression Testing
Catch breaking changes in agent behavior.