Memory Compression

Category: Optimization

Problem

Over time, an agent accumulates thousands of individual memories — each conversation turn, each observation, each learned fact. This unbounded growth increases storage costs, slows retrieval, and pollutes recall results with redundant or low-value entries. The agent needs a way to consolidate old memories without discarding critical information.

Architecture

This pattern implements a periodic consolidation job that recalls batches of old, low-importance memories, summarizes them into compressed representations using an LLM, and stores the summaries with high importance. The original granular memories are then decayed or removed. Think of it as defragmenting a hard drive for agent memory.

Flow

Implementation

from dakera import Dakera
from datetime import datetime, timedelta

client = Dakera(base_url="http://localhost:3300", api_key="dk-...")

def compress_old_memories(user_id: str, days_old: int = 30, batch_size: int = 20):
    """Consolidate old memories into compressed summaries."""
    namespace = f"user-{user_id}"
    cutoff = (datetime.utcnow() - timedelta(days=days_old)).isoformat()

    # Step 1: Recall old, low-importance memories
    old_memories = client.memory.recall(
        query="*",
        namespace=namespace,
        top_k=batch_size
    )

    # Filter to memories older than cutoff
    to_compress = [
        m for m in old_memories["results"]
        if m.get("metadata", {}).get("timestamp", "") < cutoff
        and m.get("metadata", {}).get("importance", 1.0) < 0.6
    ]

    if len(to_compress) < 5:
        return {"compressed": 0, "message": "Not enough old memories to compress"}

    # Step 2: Concatenate contents for summarization
    contents = [m["content"] for m in to_compress]
    combined_text = "\n".join(contents)

    # Step 3: Generate summary (use your preferred LLM)
    summary = summarize_with_llm(combined_text)  # Your LLM summarization function

    # Step 4: Store the compressed summary with high importance
    client.memory.store(
        content=summary,
        namespace=namespace,
        metadata={
            "type": "compressed_summary",
            "original_count": len(to_compress),
            "date_range": f"before {cutoff}",
            "importance": 0.85,
            "compressed_at": datetime.utcnow().isoformat()
        }
    )

    # Step 5: Decay original memories (set very low importance)
    for mem in to_compress:
        client.memory.store(
            content=mem["content"],
            namespace=namespace,
            metadata={
                **mem.get("metadata", {}),
                "importance": 0.05,
                "compressed_into": summary[:50]
            }
        )

    return {"compressed": len(to_compress), "summary_length": len(summary)}

def summarize_with_llm(text: str) -> str:
    """Summarize a batch of memories into key facts (implement with your LLM)."""
    # Replace with your LLM call
    # Prompt: "Summarize these user interactions into key facts and preferences:"
    return f"Summary of {len(text.splitlines())} interactions: [LLM output here]"

# Usage: run as a periodic job (daily or weekly)
result = compress_old_memories("alice", days_old=30, batch_size=50)
# Compresses 50 old memories into a single summary paragraph

When to Use This Pattern

Key Considerations