Tool Usage Learning
Category: Agent Behavior
Problem
AI agents with access to multiple tools often make suboptimal choices — calling an expensive API when a local cache suffices, or using the wrong search tool for a given query type. Without memory of past tool outcomes, agents cannot learn from experience and repeat the same mistakes across sessions.
Architecture
This pattern stores tool invocation outcomes in Dakera with structured metadata (tool name, task type, success/failure, latency). Before selecting a tool, the agent recalls past outcomes for similar tasks and uses that history to inform its choice. Successful patterns accumulate higher importance scores over time.
Flow
- Agent receives a task requiring tool selection
- Recall past tool outcomes for similar task descriptions
- Rank available tools by historical success rate
- Execute the chosen tool and store the outcome
- Boost importance of successful patterns, decay failures
Implementation
from dakera import Dakera
from datetime import datetime
client = Dakera(base_url="http://localhost:3300", api_key="dk-...")
def record_tool_outcome(agent_id: str, task: str, tool_name: str, success: bool, latency_ms: int):
"""Store a tool usage outcome for future learning."""
outcome = "success" if success else "failure"
client.memory.store(
content=f"Task: {task} | Tool: {tool_name} | Result: {outcome}",
namespace=f"agent-{agent_id}-tools",
metadata={
"tool": tool_name,
"task_type": task,
"outcome": outcome,
"latency_ms": latency_ms,
"importance": 0.9 if success else 0.4,
"timestamp": datetime.utcnow().isoformat()
}
)
def select_best_tool(agent_id: str, task: str, available_tools: list) -> str:
"""Recall past outcomes to select the best tool for a task."""
results = client.memory.recall(
query=f"Task: {task}",
namespace=f"agent-{agent_id}-tools",
top_k=20
)
# Tally success rates per tool
tool_scores = {tool: {"success": 0, "total": 0} for tool in available_tools}
for mem in results["results"]:
meta = mem.get("metadata", {})
tool = meta.get("tool")
if tool in tool_scores:
tool_scores[tool]["total"] += 1
if meta.get("outcome") == "success":
tool_scores[tool]["success"] += 1
# Rank by success rate (default to first available if no history)
ranked = sorted(
available_tools,
key=lambda t: (
tool_scores[t]["success"] / max(tool_scores[t]["total"], 1)
),
reverse=True
)
return ranked[0]
# Usage: agent learns that web_search works better than wiki_lookup for current events
record_tool_outcome("support-bot", "current weather query", "web_search", True, 320)
record_tool_outcome("support-bot", "current weather query", "wiki_lookup", False, 1200)
# Next time, agent picks the proven tool
best = select_best_tool("support-bot", "current weather query", ["web_search", "wiki_lookup"])
# Returns: "web_search"
When to Use This Pattern
- Multi-tool agents that need to optimize tool selection over time
- Systems where different tools have varying costs, latencies, or accuracy
- Autonomous agents that should improve without manual prompt engineering
- Pipelines where tool failures are expensive and should be avoided
Key Considerations
- Store both successes and failures — failures with low importance still provide signal
- Include latency metadata to prefer faster tools when accuracy is equivalent
- Decay old outcomes gradually so the agent adapts when tool capabilities change
- Use a separate namespace per agent to avoid cross-contamination of learned patterns