Tool Usage Learning

Category: Agent Behavior

Problem

AI agents with access to multiple tools often make suboptimal choices — calling an expensive API when a local cache suffices, or using the wrong search tool for a given query type. Without memory of past tool outcomes, agents cannot learn from experience and repeat the same mistakes across sessions.

Architecture

This pattern stores tool invocation outcomes in Dakera with structured metadata (tool name, task type, success/failure, latency). Before selecting a tool, the agent recalls past outcomes for similar tasks and uses that history to inform its choice. Successful patterns accumulate higher importance scores over time.

Flow

Agent receives a task requiring tool selection
Recall past tool outcomes for similar task descriptions
Rank available tools by historical success rate
Execute the chosen tool and store the outcome
Boost importance of successful patterns, decay failures

Implementation

from dakera import Dakera
from datetime import datetime

client = Dakera(base_url="http://localhost:3300", api_key="dk-...")

def record_tool_outcome(agent_id: str, task: str, tool_name: str, success: bool, latency_ms: int):
    """Store a tool usage outcome for future learning."""
    outcome = "success" if success else "failure"

    client.memory.store(
        content=f"Task: {task} | Tool: {tool_name} | Result: {outcome}",
        namespace=f"agent-{agent_id}-tools",
        metadata={
            "tool": tool_name,
            "task_type": task,
            "outcome": outcome,
            "latency_ms": latency_ms,
            "importance": 0.9 if success else 0.4,
            "timestamp": datetime.utcnow().isoformat()
        }
    )

def select_best_tool(agent_id: str, task: str, available_tools: list) -> str:
    """Recall past outcomes to select the best tool for a task."""
    results = client.memory.recall(
        query=f"Task: {task}",
        namespace=f"agent-{agent_id}-tools",
        top_k=20
    )

    # Tally success rates per tool
    tool_scores = {tool: {"success": 0, "total": 0} for tool in available_tools}

    for mem in results["results"]:
        meta = mem.get("metadata", {})
        tool = meta.get("tool")
        if tool in tool_scores:
            tool_scores[tool]["total"] += 1
            if meta.get("outcome") == "success":
                tool_scores[tool]["success"] += 1

    # Rank by success rate (default to first available if no history)
    ranked = sorted(
        available_tools,
        key=lambda t: (
            tool_scores[t]["success"] / max(tool_scores[t]["total"], 1)
        ),
        reverse=True
    )
    return ranked[0]

# Usage: agent learns that web_search works better than wiki_lookup for current events
record_tool_outcome("support-bot", "current weather query", "web_search", True, 320)
record_tool_outcome("support-bot", "current weather query", "wiki_lookup", False, 1200)

# Next time, agent picks the proven tool
best = select_best_tool("support-bot", "current weather query", ["web_search", "wiki_lookup"])
# Returns: "web_search"

When to Use This Pattern

Multi-tool agents that need to optimize tool selection over time
Systems where different tools have varying costs, latencies, or accuracy
Autonomous agents that should improve without manual prompt engineering
Pipelines where tool failures are expensive and should be avoided

Key Considerations

Store both successes and failures — failures with low importance still provide signal
Include latency metadata to prefer faster tools when accuracy is equivalent
Decay old outcomes gradually so the agent adapts when tool capabilities change
Use a separate namespace per agent to avoid cross-contamination of learned patterns