Memory Compression
Category: Optimization
Problem
Over time, an agent accumulates thousands of individual memories — each conversation turn, each observation, each learned fact. This unbounded growth increases storage costs, slows retrieval, and pollutes recall results with redundant or low-value entries. The agent needs a way to consolidate old memories without discarding critical information.
Architecture
This pattern implements a periodic consolidation job that recalls batches of old, low-importance memories, summarizes them into compressed representations using an LLM, and stores the summaries with high importance. The original granular memories are then decayed or removed. Think of it as defragmenting a hard drive for agent memory.
Flow
- Identify old memories below an importance threshold
- Group related memories by topic or time window
- Summarize each group into a single compressed memory
- Store the summary with elevated importance
- Decay or forget the original granular entries
Implementation
from dakera import Dakera
from datetime import datetime, timedelta
client = Dakera(base_url="http://localhost:3300", api_key="dk-...")
def compress_old_memories(user_id: str, days_old: int = 30, batch_size: int = 20):
"""Consolidate old memories into compressed summaries."""
namespace = f"user-{user_id}"
cutoff = (datetime.utcnow() - timedelta(days=days_old)).isoformat()
# Step 1: Recall old, low-importance memories
old_memories = client.memory.recall(
query="*",
namespace=namespace,
top_k=batch_size
)
# Filter to memories older than cutoff
to_compress = [
m for m in old_memories["results"]
if m.get("metadata", {}).get("timestamp", "") < cutoff
and m.get("metadata", {}).get("importance", 1.0) < 0.6
]
if len(to_compress) < 5:
return {"compressed": 0, "message": "Not enough old memories to compress"}
# Step 2: Concatenate contents for summarization
contents = [m["content"] for m in to_compress]
combined_text = "\n".join(contents)
# Step 3: Generate summary (use your preferred LLM)
summary = summarize_with_llm(combined_text) # Your LLM summarization function
# Step 4: Store the compressed summary with high importance
client.memory.store(
content=summary,
namespace=namespace,
metadata={
"type": "compressed_summary",
"original_count": len(to_compress),
"date_range": f"before {cutoff}",
"importance": 0.85,
"compressed_at": datetime.utcnow().isoformat()
}
)
# Step 5: Decay original memories (set very low importance)
for mem in to_compress:
client.memory.store(
content=mem["content"],
namespace=namespace,
metadata={
**mem.get("metadata", {}),
"importance": 0.05,
"compressed_into": summary[:50]
}
)
return {"compressed": len(to_compress), "summary_length": len(summary)}
def summarize_with_llm(text: str) -> str:
"""Summarize a batch of memories into key facts (implement with your LLM)."""
# Replace with your LLM call
# Prompt: "Summarize these user interactions into key facts and preferences:"
return f"Summary of {len(text.splitlines())} interactions: [LLM output here]"
# Usage: run as a periodic job (daily or weekly)
result = compress_old_memories("alice", days_old=30, batch_size=50)
# Compresses 50 old memories into a single summary paragraph
When to Use This Pattern
- Long-running agents with high-volume memory accumulation
- Applications with storage cost constraints
- Systems where recall latency increases with memory count
- Any deployment that needs to maintain quality recall over months or years
Key Considerations
- Never compress high-importance memories — only target low-value, old entries
- Keep original memories briefly after compression in case the summary needs verification
- Run compression as a background job, not in the hot path of user requests
- Track compression metadata so you can audit what was consolidated and when