Memory Compression

Category: Optimization

Problem

Over time, an agent accumulates thousands of individual memories — each conversation turn, each observation, each learned fact. This unbounded growth increases storage costs, slows retrieval, and pollutes recall results with redundant or low-value entries. The agent needs a way to consolidate old memories without discarding critical information.

Architecture

This pattern implements a periodic consolidation job that recalls batches of old, low-importance memories, summarizes them into compressed representations using an LLM, and stores the summaries with high importance. The original granular memories are then decayed or removed. Think of it as defragmenting a hard drive for agent memory.

Flow

Identify old memories below an importance threshold
Group related memories by topic or time window
Summarize each group into a single compressed memory
Store the summary with elevated importance
Decay or forget the original granular entries

Implementation

from dakera import Dakera
from datetime import datetime, timedelta

client = Dakera(base_url="http://localhost:3300", api_key="dk-...")

def compress_old_memories(user_id: str, days_old: int = 30, batch_size: int = 20):
    """Consolidate old memories into compressed summaries."""
    namespace = f"user-{user_id}"
    cutoff = (datetime.utcnow() - timedelta(days=days_old)).isoformat()

    # Step 1: Recall old, low-importance memories
    old_memories = client.memory.recall(
        query="*",
        namespace=namespace,
        top_k=batch_size
    )

    # Filter to memories older than cutoff
    to_compress = [
        m for m in old_memories["results"]
        if m.get("metadata", {}).get("timestamp", "") < cutoff
        and m.get("metadata", {}).get("importance", 1.0) < 0.6
    ]

    if len(to_compress) < 5:
        return {"compressed": 0, "message": "Not enough old memories to compress"}

    # Step 2: Concatenate contents for summarization
    contents = [m["content"] for m in to_compress]
    combined_text = "\n".join(contents)

    # Step 3: Generate summary (use your preferred LLM)
    summary = summarize_with_llm(combined_text)  # Your LLM summarization function

    # Step 4: Store the compressed summary with high importance
    client.memory.store(
        content=summary,
        namespace=namespace,
        metadata={
            "type": "compressed_summary",
            "original_count": len(to_compress),
            "date_range": f"before {cutoff}",
            "importance": 0.85,
            "compressed_at": datetime.utcnow().isoformat()
        }
    )

    # Step 5: Decay original memories (set very low importance)
    for mem in to_compress:
        client.memory.store(
            content=mem["content"],
            namespace=namespace,
            metadata={
                **mem.get("metadata", {}),
                "importance": 0.05,
                "compressed_into": summary[:50]
            }
        )

    return {"compressed": len(to_compress), "summary_length": len(summary)}

def summarize_with_llm(text: str) -> str:
    """Summarize a batch of memories into key facts (implement with your LLM)."""
    # Replace with your LLM call
    # Prompt: "Summarize these user interactions into key facts and preferences:"
    return f"Summary of {len(text.splitlines())} interactions: [LLM output here]"

# Usage: run as a periodic job (daily or weekly)
result = compress_old_memories("alice", days_old=30, batch_size=50)
# Compresses 50 old memories into a single summary paragraph

When to Use This Pattern

Long-running agents with high-volume memory accumulation
Applications with storage cost constraints
Systems where recall latency increases with memory count
Any deployment that needs to maintain quality recall over months or years

Key Considerations

Never compress high-importance memories — only target low-value, old entries
Keep original memories briefly after compression in case the summary needs verification
Run compression as a background job, not in the hot path of user requests
Track compression metadata so you can audit what was consolidated and when