AI agents are only as useful as the context they retain. Without persistent memory, every conversation starts from zero — agents forget preferences, lose track of multi-session projects, and repeat the same questions endlessly. In 2026, agent memory has matured from a research curiosity into critical infrastructure. Frameworks now compete on retrieval accuracy, deployment simplicity, latency, and data sovereignty.
This guide compares the top agent memory frameworks available today: Dakera, Mem0, Letta, Zep, Hindsight, and Cognee. We evaluate each on benchmark performance, retrieval architecture, deployment model, dependency footprint, encryption, and pricing — then provide clear recommendations for when each framework makes sense, including a step-by-step migration guide for teams moving from Mem0 to Dakera.
Why Agent Memory Matters in 2026
The shift from single-turn chatbots to autonomous multi-step agents has made memory non-negotiable. Consider what breaks without it:
- Coding agents forget project conventions between sessions, generating inconsistent code
- Customer support agents re-ask for order numbers and preferences every interaction
- Research agents lose track of what they've already explored, duplicating work
- Personal assistants can't learn user preferences over time
The memory layer sits between the LLM and the application — ingesting conversation turns, extracting salient facts, storing them durably, and retrieving the right context at query time. The quality of this pipeline directly determines whether an agent feels intelligent or broken.
Evaluation Criteria
We use six criteria to compare frameworks:
- Benchmark accuracy — LoCoMo (Long Conversation Memory) scores across single-hop, multi-hop, and temporal reasoning categories
- Retrieval architecture — vector-only vs. hybrid (vector + keyword + reranking), graph enrichment, temporal awareness
- Deployment model — self-hosted vs. cloud-only, binary vs. container, operational overhead
- Dependency footprint — external services required (embedding APIs, databases, LLMs)
- Security and encryption — at-rest encryption, tenant isolation, data residency
- Pricing — open-source vs. proprietary, per-query costs, cloud markup
LoCoMo is a benchmark designed specifically for evaluating long-conversation memory systems. It tests three categories: Category 1 (single-hop factual recall), Category 2 (multi-hop reasoning across memories), and Category 3 (temporal reasoning — understanding that facts change over time). Category 3 is the hardest: it requires knowing that "Alice moved to Berlin in March" supersedes "Alice lives in London" from an earlier conversation.
Dakera
Overview
Dakera is a Rust-based memory engine distributed as a single 44 MB static binary. It runs entirely on-device with no external dependencies — no cloud embedding API, no separate database, no Python runtime. The HNSW vector index, BM25 full-text index, and cross-encoder reranker all execute locally within the same process.
Retrieval Architecture
Dakera uses a three-stage hybrid retrieval pipeline:
- Candidate generation — parallel HNSW vector search and BM25 keyword search produce initial candidate sets
- Fusion — candidates are merged using reciprocal rank fusion (RRF), eliminating duplicates while preserving signal from both retrieval paths
- Reranking — a cross-encoder model rescores the top candidates for semantic relevance, with temporal decay and importance weighting applied
This architecture handles the failure modes of vector-only retrieval. BM25 catches exact-match queries that embedding models fumble (names, IDs, specific dates), while the cross-encoder compensates for embedding space limitations on nuanced semantic queries.
On-Device Inference
Embeddings are generated locally using quantized ONNX models bundled with the binary. No data leaves the machine for inference — there are no API calls to OpenAI or any external embedding service. This eliminates network latency from the critical path and removes a recurring cost center.
Security
All memory data is encrypted at rest with AES-256-GCM. Namespace-level isolation enforces tenant boundaries at the storage layer. The binary runs without network access requirements — it can operate in air-gapped environments.
Integration
Dakera exposes 14 core tools (86+ available via profiles) via the Model Context Protocol (MCP), plus gRPC and REST APIs. Native SDKs exist for Python, JavaScript, Rust, and Go. The MCP interface means any MCP-compatible agent can use Dakera as its memory backend without custom integration code:
{
"mcpServers": {
"dakera": {
"command": "dakera",
"args": ["mcp", "--namespace", "my-agent"]
}
}
}
When Dakera Excels
- Production deployments where benchmark accuracy matters
- Privacy-sensitive workloads (healthcare, legal, finance) that cannot send data to third-party APIs
- Self-hosted infrastructure where you need a single binary, not a Docker Compose stack
- High-throughput scenarios requiring low-latency retrieval without garbage collection pauses
- Multi-agent systems needing cross-agent knowledge sharing with tenant isolation
Mem0
Overview
Mem0 is a Python-based memory framework that has gained significant traction in the prototyping and startup community. It offers both a managed cloud platform and self-hosted deployment, with a clean API that makes integration straightforward. Mem0 focuses on simplicity — get memory working in your agent with minimal code.
Retrieval Architecture
Mem0 uses vector-only retrieval powered by external embedding models (typically OpenAI's text-embedding-3-small or text-embedding-3-large). Memories are stored in a vector database (Qdrant, Pinecone, or ChromaDB depending on configuration). Search is cosine similarity against the embedding space.
The vector-only approach works well for semantic similarity queries but has known weaknesses: exact-match failures (searching for a specific name or date), keyword-dependent queries, and temporal reasoning (no mechanism to prefer recent facts over stale ones without additional application logic).
Dependency Footprint
A Mem0 deployment requires: Python runtime, an embedding API (OpenAI or similar), a vector database (Qdrant/Pinecone/Chroma), and optionally an LLM for memory extraction. The cloud version abstracts these dependencies; self-hosted requires managing them yourself.
Strengths
- Developer experience — clean Python API, excellent documentation, fast time-to-prototype
- Cloud option — managed platform eliminates infrastructure concerns for early-stage projects
- Ecosystem — integrations with LangChain, LlamaIndex, CrewAI, and other popular agent frameworks
- Community — active open-source community with frequent releases
Limitations
- Vector-only retrieval misses keyword and temporal queries
- Dependent on external embedding APIs (latency + cost + data leaves your infrastructure)
- Self-hosted requires managing multiple services (vector DB + embedding API + application layer)
- No built-in encryption at rest in the open-source version
When Mem0 Excels
- Rapid prototyping where time-to-first-memory matters most
- Teams already using OpenAI embeddings who want to minimize new infrastructure
- Cloud-native deployments where managed services are preferred
- Simple use cases where semantic similarity is sufficient (preferences, general facts)
Letta (formerly MemGPT)
Overview
Letta takes a fundamentally different approach to agent memory. Instead of a traditional retrieval pipeline, Letta puts an LLM in the loop of memory management itself. The LLM decides what to remember, how to organize memories, and what to retrieve — treating memory as an LLM reasoning problem rather than an information retrieval problem.
This "LLM-as-memory-manager" paradigm is inspired by the MemGPT paper, which proposed using the LLM's own capabilities to manage a tiered memory system (core memory + archival memory + recall memory).
Architecture
Letta maintains three memory tiers:
- Core memory — always in the LLM's context window (persona, user preferences, key facts)
- Archival memory — long-term storage searched on demand via the LLM's tool calls
- Recall memory — recent conversation history with automatic summarization
The LLM itself issues memory operations (search, insert, update, delete) as tool calls during conversation. This means the quality of memory management depends heavily on the underlying LLM's capabilities.
Strengths
- Creative architecture — the LLM can reason about what's worth remembering, perform implicit deduplication, and summarize proactively
- Flexible memory organization — no fixed schema; the LLM organizes memories however makes sense for the use case
- Conversation continuity — excellent at maintaining narrative coherence across sessions
- Active development — well-funded team with a clear vision for autonomous agent infrastructure
Limitations
- Latency — every memory operation requires an LLM call, adding 500ms-2s per operation
- Cost — memory management consumes LLM tokens, which can be significant at scale
- LLM dependency — memory quality is bounded by the underlying model's capabilities
- Determinism — identical inputs may produce different memory states depending on LLM sampling
- Scale concerns — LLM-in-the-loop doesn't scale to thousands of concurrent agents as efficiently as traditional retrieval
When Letta Excels
- Conversational agents where narrative coherence matters more than raw retrieval speed
- Research and experimentation with novel memory architectures
- Use cases where memory organization is complex and benefits from LLM reasoning
- Small-scale deployments where per-query LLM cost is acceptable
Zep
Overview
Zep combines vector search with knowledge graph enrichment, automatically extracting entities and relationships from conversations and building a graph structure alongside the vector index. Originally open-source, Zep has transitioned to a cloud-first model — the managed Zep Cloud is the primary product, while the self-hosted community edition has been deprecated.
Architecture
Zep's retrieval pipeline enriches memories with structured entity data:
- Ingestion — conversations are processed for embedding generation and entity extraction simultaneously
- Graph construction — extracted entities (people, places, organizations, events) are linked into a knowledge graph
- Hybrid retrieval — queries search both the vector index and traverse the entity graph for related facts
The graph layer adds value for entity-centric queries ("What do I know about Alice?") that might scatter across many individual memory entries in a vector-only system.
Strengths
- Graph-enriched retrieval — entity extraction and relationship mapping improve recall for "tell me everything about X" queries
- Automatic summarization — conversations are summarized progressively, reducing storage and improving retrieval relevance
- Enterprise features — user management, audit logs, and compliance controls in the cloud version
- Structured data extraction — entities, relationships, and facts are extracted into queryable structures
Limitations
- Cloud lock-in — the OSS edition is deprecated; production use requires Zep Cloud
- No self-hosted path — organizations requiring data sovereignty have limited options
- External LLM dependency — entity extraction and summarization require LLM API calls
- Pricing opacity — cloud costs scale with usage in ways that are hard to predict upfront
When Zep Excels
- Enterprise teams who want managed infrastructure with graph capabilities out of the box
- Use cases heavily focused on entity relationships (CRM agents, people-centric assistants)
- Organizations that prefer cloud services over self-hosted infrastructure
- Teams needing structured entity extraction alongside unstructured memory
Hindsight
Overview
Hindsight is a newer entrant in the agent memory space, emerging from academic research into practical tooling. It focuses on reflective memory — the idea that agents should periodically review and reorganize their memories, identifying patterns and synthesizing insights that weren't apparent during initial storage.
Architecture
Hindsight introduces a "reflection" pass where stored memories are periodically re-examined by an LLM to generate higher-order insights. This is inspired by the Generative Agents paper's reflection mechanism, applied to persistent memory rather than in-context simulation.
Strengths
- Research-informed design — built on solid cognitive science and AI research foundations
- Insight generation — produces meta-memories that capture patterns across individual facts
- Novel approach — addresses a gap other frameworks ignore (memory consolidation and synthesis)
Limitations
- Early stage — fewer production deployments and less battle-tested than alternatives
- Limited documentation — community and docs are still maturing
- Performance unknown — no published LoCoMo or equivalent benchmark scores
- LLM cost for reflections — periodic re-processing of memory stores adds ongoing compute cost
When Hindsight Excels
- Research projects exploring novel memory architectures
- Use cases where pattern discovery across memories adds value (journaling agents, learning assistants)
- Teams comfortable with early-stage tooling who want to contribute upstream
Cognee
Overview
Cognee is an open-source memory framework built around knowledge graph construction. Where most frameworks store memories as text embeddings, Cognee processes input through a pipeline that extracts structured knowledge — entities, relationships, and ontologies — and stores them in a graph database. The result is a memory system optimized for relational queries: "what does this person work on?", "which systems depend on service X?", "what changed in this codebase over the past month?"
Cognee's design philosophy is that raw text storage wastes the structured information embedded in conversations and documents. By extracting that structure at ingestion time, retrieval becomes a graph traversal problem rather than a fuzzy similarity search.
Architecture
Cognee's ingestion pipeline processes input through multiple stages:
- Chunking — input text is split into semantically coherent segments
- Entity extraction — named entities, concepts, and relationships are extracted via NLP models
- Graph construction — extracted entities and relationships are stored in a graph database (supports Neo4j, NetworkX, and Kuzu)
- Vector indexing — text chunks are embedded and stored for semantic search alongside the graph
- Hybrid retrieval — queries traverse the graph and search the vector index, merging results
Strengths
- Knowledge graph-first — purpose-built for extracting and querying structured knowledge from unstructured text, not bolted on as an afterthought
- Ontology support — supports custom ontologies for domain-specific entity types and relationships
- Open-source — MIT licensed, fully self-hostable with no cloud dependency
- Multiple graph backends — choose from Neo4j (production), Kuzu (embedded), or NetworkX (development)
- Document ingestion — designed for processing documents, PDFs, and knowledge bases, not just conversation turns
Limitations
- LLM dependency for extraction — entity extraction quality depends on the LLM used; requires an external LLM API or local model for the extraction step
- Ingestion latency — the multi-stage pipeline (extract → graph → embed) is slower than direct embedding-and-store approaches
- Graph backend overhead — running Neo4j adds significant operational overhead compared to Dakera's single-binary approach
- No temporal decay — Cognee does not have built-in memory decay; old graph nodes persist at full strength indefinitely
- Limited MCP support — MCP integration is newer and less mature than Dakera's native MCP toolset
- No published LoCoMo benchmarks — retrieval accuracy is not independently validated on the standard benchmark suite
When Cognee Excels
- Agents that reason over large document corpora where entity relationships are central
- Knowledge management systems where the graph structure is a first-class output
- Research environments where custom ontologies and graph schemas are needed
- Use cases where Neo4j is already in the infrastructure stack
Cognee vs. Dakera on Knowledge Graphs
Both Cognee and Dakera include knowledge graph capabilities, but with different design priorities. Cognee treats the graph as the primary storage format and the entry point for all queries. Dakera's knowledge graph is an enrichment layer on top of hybrid retrieval — memories are stored as text and queried via HNSW + BM25 + reranking, with the knowledge graph available for entity-centric queries when needed. For pure knowledge graph workflows, Cognee is purpose-built. For general-purpose agent memory where graph is one of several retrieval modes, Dakera's hybrid approach is more versatile.
Head-to-Head Comparison
| Framework | LoCoMo Score | Retrieval | Deployment | Dependencies | Encryption | Pricing |
|---|---|---|---|---|---|---|
| Dakera | 88.2% | HNSW + BM25 + cross-encoder | Self-hosted (single binary) | None (fully self-contained) | AES-256-GCM at rest | Open-core, free tier |
| Mem0 | ~70%* | Vector-only (cosine similarity) | Cloud + self-hosted | OpenAI API + vector DB | Cloud-managed TLS | Free OSS / Cloud pay-per-use |
| Letta | ~65%* | LLM-in-the-loop | Self-hosted (Python) | LLM API (GPT-4/Claude) | Application-level | Open-source / Cloud |
| Zep | ~72%* | Vector + knowledge graph | Cloud-only (OSS deprecated) | LLM API for extraction | Cloud-managed | Cloud pay-per-use |
| Hindsight | Not published | Vector + reflective synthesis | Self-hosted (Python) | LLM API for reflections | Not specified | Open-source |
| Cognee | Not published | Knowledge graph + vector | Self-hosted (Python) | LLM API + graph DB (Neo4j) | Depends on graph backend | Open-source (MIT) |
* Estimated scores based on architecture analysis and community-reported results. Only Dakera publishes official LoCoMo scores from a reproducible benchmark suite run against the full 1,540 question set.
Detailed Pros and Cons
The summary table above captures the high-level picture. This section goes deeper into the practical trade-offs for each framework, drawing on real deployment experience.
Dakera — Pros and Cons
| Pros | Cons |
|---|---|
| Highest published retrieval accuracy (88.2% LoCoMo) | Newer project — smaller community than Mem0 or Letta |
| Single 44 MB binary, zero external dependencies | Rust binary — no Python-native integration; requires SDK or REST API |
| On-device ONNX embeddings — no data leaves your machine | Advanced configuration (HNSW tuning, decay) has a learning curve |
| AES-256-GCM encryption at rest, air-gap capable | GPU-accelerated embedding requires CUDA setup |
| MCP-native with 14 core tools (86+ via profiles) | Knowledge graph edge types are limited to 4 built-in types |
| 6 decay strategies, importance scoring, temporal awareness | No managed cloud option — you own the infrastructure |
| SDKs for Python, TypeScript, Go, Rust | Benchmark scores self-reported (though methodology is published) |
Mem0 — Pros and Cons
| Pros | Cons |
|---|---|
| Fastest time-to-first-memory for Python developers | Vector-only retrieval misses keyword and temporal queries |
Clean, minimal API — mem0.add() and mem0.search() | Requires external embedding API (OpenAI, Cohere) — data leaves your infra |
| Managed cloud removes infrastructure burden | Self-hosted requires Qdrant + embedding API + Python runtime |
| Large community, many integration guides | No built-in encryption at rest in OSS version |
| Compatible with LangChain, CrewAI, LlamaIndex | Scaling costs are unpredictable — per-query embedding API fees |
| Active development and frequent releases | No knowledge graph, no temporal decay, no importance scoring |
Letta — Pros and Cons
| Pros | Cons |
|---|---|
| LLM-driven memory management — intelligent, context-aware | 500ms–2s latency per memory operation (LLM round-trip) |
| Excellent conversation narrative coherence | High per-operation cost — every store/recall burns LLM tokens |
| Flexible memory organization without fixed schema | Non-deterministic — same input can produce different memory states |
| Well-funded team with clear product vision | Does not scale economically to thousands of concurrent agents |
| Tiered memory model (core/archival/recall) mirrors cognitive science | LLM dependency means memory quality is bounded by model capability |
Zep — Pros and Cons
| Pros | Cons |
|---|---|
| Graph-enriched retrieval excellent for entity-centric queries | OSS self-hosted edition deprecated — cloud-only for production |
| Automatic conversation summarization | No data sovereignty — all data in Zep Cloud (AWS) |
| Enterprise features: audit logs, user management | Requires LLM API for entity extraction (data leaves your infrastructure) |
| Structured entity and relationship extraction | Pricing scales with usage — hard to predict costs upfront |
Cognee — Pros and Cons
| Pros | Cons |
|---|---|
| Purpose-built for knowledge graph extraction from documents | Ingestion pipeline is slow (multi-stage: chunk → extract → graph → embed) |
| Custom ontology support for domain-specific entity types | Requires LLM API for entity extraction step |
| MIT licensed, fully open-source | Neo4j adds significant operational overhead for production deployments |
| Multiple graph backends (Neo4j, Kuzu, NetworkX) | No temporal decay — graph nodes persist at full strength indefinitely |
| Strong for document-centric knowledge management | MCP integration is newer and less mature |
Architecture Deep Dive: Why Retrieval Method Matters
Vector-Only Limitations
Vector search excels at semantic similarity but fails predictably in several cases:
- Exact-match queries — "What is Alice's employee ID?" The answer (a number like "EMP-4892") has no semantic meaning in embedding space
- Temporal queries — "What did Bob say last Tuesday?" requires date awareness that embeddings don't capture
- Negation — "Which projects am I NOT involved in?" is semantically similar to "Which projects am I involved in?" in embedding space
- Keyword specificity — searching for a specific API name, error code, or technical term that embeddings smooth away
Hybrid Retrieval Advantages
Adding BM25 keyword search alongside vector search covers the exact-match and keyword-specificity gaps. The cross-encoder reranking layer then resolves conflicts between the two signal sources, promoting results that are both semantically relevant and lexically precise.
This three-stage pipeline is why Dakera's LoCoMo scores significantly exceed vector-only systems. Category 1 (single-hop) benefits from BM25 catching specific facts. Category 2 (multi-hop) benefits from broader candidate generation across both indices. Category 3 (temporal) benefits from the reranker's ability to weight recency signals.
Deployment and Operations Compared
Binary Simplicity vs. Service Orchestration
The operational difference between frameworks is dramatic:
# Dakera: one command
docker run -d --name dakera -p 3300:3300 -e DAKERA_INFERENCE_ENABLED=true ghcr.io/dakera-ai/dakera:latest
# Mem0 (self-hosted): Python + vector DB + embedding API
pip install mem0ai
# Also need: Qdrant running, OpenAI API key configured
docker run -p 6333:6333 qdrant/qdrant
export OPENAI_API_KEY="sk-..."
python -c "from mem0 import Memory; m = Memory()"
# Letta: Python + LLM API
pip install letta
export OPENAI_API_KEY="sk-..."
letta server --port 8283
For production deployments, the dependency count matters. Each external service is a potential failure point, a version to maintain, and a cost to monitor. Dakera's single-binary approach eliminates entire categories of operational incidents.
Resource Footprint
| Framework | RAM (100K memories) | Disk | CPU | Network |
|---|---|---|---|---|
| Dakera | ~400 MB | ~2 GB | Any (ARM/x64) | None required |
| Mem0 | ~1.5 GB (with Qdrant) | ~3 GB | x64 typical | Embedding API calls |
| Letta | ~800 MB | ~1 GB | x64 typical | LLM API calls per operation |
| Zep | Managed (cloud) | Managed (cloud) | Managed (cloud) | All operations via API |
Security and Data Sovereignty
For many organizations, where memory data lives is as important as how well it's retrieved. Agent memories contain sensitive information — user preferences, business context, personal details, and proprietary knowledge.
| Framework | Data Residency | Encryption at Rest | Air-Gap Capable | Tenant Isolation |
|---|---|---|---|---|
| Dakera | Your infrastructure | AES-256-GCM | Yes | Namespace-level |
| Mem0 | Your infra or Mem0 Cloud | Cloud-managed only | No (needs embedding API) | API key level |
| Letta | Your infrastructure | Application-level | No (needs LLM API) | Agent-level |
| Zep | Zep Cloud (AWS regions) | Cloud-managed | No | Project-level |
Only Dakera can operate in a fully air-gapped environment — no network required for any operation including embedding generation. This makes it the only viable option for classified environments, on-premises healthcare systems, and edge deployments without reliable internet.
When to Use Each Framework
Decision Guide
Common Migration Paths
Teams often start with one framework and migrate as requirements crystallize:
- Mem0 to Dakera — teams outgrow vector-only retrieval accuracy or want to eliminate the OpenAI embedding dependency. See the step-by-step migration guide below.
- Letta to Dakera — teams find LLM-in-the-loop latency unacceptable at scale and need deterministic, fast retrieval without per-query LLM cost.
- Zep to Dakera — organizations need self-hosting for data sovereignty or want to eliminate cloud vendor lock-in after Zep deprecated their OSS edition.
- Cognee to Dakera — teams using Cognee for document knowledge graphs find they need better conversation memory, temporal decay, and MCP integration.
Migration Guide: Mem0 to Dakera
Mem0 is the most common starting point for teams building agent memory. When you outgrow its vector-only retrieval or need to eliminate the OpenAI embedding dependency, migrating to Dakera is a four-phase process: export, ingest, switch, and validate.
Why Teams Migrate
The most frequent reasons teams move from Mem0 to Dakera:
- Retrieval accuracy — vector-only retrieval misses exact-match queries (error codes, names, IDs), temporal queries (what changed recently), and keyword-specific searches. Dakera's hybrid BM25 + HNSW + cross-encoder pipeline handles all of these.
- Data sovereignty — Mem0 requires sending text to OpenAI's embedding API for every store and search operation. Dakera computes embeddings on-device via ONNX — no text leaves your infrastructure.
- Cost predictability — Mem0 cloud costs scale with usage (per-query embedding fees). Dakera runs on a fixed-cost server with no per-operation external API costs.
- Privacy requirements — healthcare, legal, and financial use cases often prohibit sending data to third-party APIs. Dakera's on-device embeddings and AES-256-GCM encryption at rest satisfy these requirements.
Phase 1: Export Memories from Mem0
Mem0 provides a get_all() method to retrieve all stored memories. Export them to a JSON file:
from mem0 import MemoryClient
import json
# Initialize Mem0 client
client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])
# Export all memories for a specific user (or all users)
all_memories = []
# If using Mem0 cloud, export by user_id
for user_id in your_user_ids:
memories = client.get_all(user_id=user_id)
for mem in memories:
all_memories.append({
"user_id": user_id,
"content": mem["memory"],
"created_at": mem.get("created_at"),
"metadata": mem.get("metadata", {}),
})
with open("mem0-export.json", "w") as f:
json.dump(all_memories, f, indent=2)
print(f"Exported {len(all_memories)} memories from Mem0")
Phase 2: Start Dakera and Configure Namespaces
Install and start Dakera. Map your Mem0 user IDs to Dakera namespaces — this preserves per-user isolation:
# Install and start Dakera
docker run -d --name dakera -p 3300:3300 -e DAKERA_INFERENCE_ENABLED=true ghcr.io/dakera-ai/dakera:latest
# Verify the server is ready
curl http://localhost:3300/health
# {"status":"healthy","version":"0.11.55"}
Phase 3: Ingest into Dakera
Re-ingest the exported memories. Dakera will compute fresh ONNX embeddings on-device and build the HNSW + BM25 indexes automatically:
from dakera import DakeraClient
import json
from datetime import datetime, timezone
client = DakeraClient(base_url="http://localhost:3301")
with open("mem0-export.json") as f:
memories = json.load(f)
print(f"Ingesting {len(memories)} memories into Dakera...")
ingested = 0
errors = 0
for mem in memories:
user_id = mem["user_id"]
content = mem["content"]
if not content or not content.strip():
errors += 1
continue
try:
client.store_memory(
# Map Mem0 user_id to Dakera namespace
agent_id=f"user-{user_id}",
content=content,
metadata={
**mem.get("metadata", {}),
"migrated_from": "mem0",
"original_user_id": user_id,
},
# Preserve original timestamp for correct temporal decay
created_at=mem.get("created_at"),
)
ingested += 1
if ingested % 100 == 0:
print(f" Progress: {ingested}/{len(memories)}")
except Exception as e:
print(f" ERROR: {e}")
errors += 1
print(f"Complete — Ingested: {ingested}, Errors: {errors}")
Phase 4: Switch Application Code
The Mem0 API (add(), search()) maps directly to Dakera's API. Here's the before/after for common operations:
# BEFORE: Mem0
from mem0 import MemoryClient
mem0_client = MemoryClient(api_key="mem0-key")
# Store
mem0_client.add(messages=[{"role": "user", "content": text}],
user_id=user_id)
# Search
results = mem0_client.search(query=query, user_id=user_id, limit=5)
memories = [r["memory"] for r in results["results"]]
# AFTER: Dakera
from dakera import DakeraClient
dakera_client = DakeraClient(base_url="http://localhost:3301")
# Store — same API surface, on-device embeddings, no external API call
dakera_client.store_memory(agent_id=f"user-{user_id}", content=text)
# Search — hybrid BM25 + HNSW + reranking, no embedding API call
results = dakera_client.search_memories(
agent_id=f"user-{user_id}", query=query, top_k=5
)
memories = [r.content for r in results.memories]
Phase 5: Validate and Cut Over
Run both systems in parallel for a validation period. Compare retrieval quality on your most important query types. If Dakera's results are equal or better (they typically are on keyword-heavy queries), cut over by removing the Mem0 dependency:
# Verify a sample of migrated memories
test_queries = [
("user-123", "What is their preferred programming language?"),
("user-456", "What database do they use?"),
("user-789", "What deployment platform?"),
]
for user_id, query in test_queries:
results = client.search_memories(
agent_id=user_id, query=query, top_k=3
)
print(f"Q: {query}")
for r in results.memories:
print(f" [{r.score:.2f}] {r.content[:80]}")
print()
# Once satisfied, stop the Mem0 dependency
# pip uninstall mem0ai
# Remove MEM0_API_KEY from your environment
For the complete technical setup including production-grade TLS, monitoring, and backup, see the self-hosted AI memory guide. For teams using MCP-compatible tools (Claude Desktop, Claude Code, Cursor), the MCP setup guide covers connecting your tools to the newly migrated Dakera instance.
The State of Agent Memory in 2026
The field has consolidated around several clear approaches: traditional information retrieval (hybrid search), LLM-in-the-loop management, and graph-enriched memory. Each serves different trade-off preferences.
Key trends shaping the landscape:
- MCP as the standard interface — Model Context Protocol is becoming the de facto way agents communicate with memory systems. Frameworks that don't support MCP are increasingly friction-heavy to integrate.
- Self-hosting resurgence — after the initial rush to cloud-managed everything, organizations are pulling sensitive data back on-premises. Agent memories are particularly sensitive — they contain the distilled knowledge of every user interaction.
- Benchmark-driven development — LoCoMo and MTOB have given the field objective quality metrics. Teams can now make informed decisions based on measured accuracy rather than marketing claims.
- Temporal reasoning as differentiator — the hardest category in memory benchmarks (Category 3: temporal) separates production-ready systems from prototypes. Handling "facts change over time" requires architectural choices that can't be bolted on after the fact.
For teams building production agents today, the decision comes down to what you value most: raw accuracy and operational simplicity (Dakera), rapid prototyping speed (Mem0), creative LLM-driven memory (Letta), graph features with managed infrastructure (Zep), or research exploration (Hindsight). There's no wrong choice for a prototype — but for production, benchmark scores and deployment economics should drive the decision.
Ready to evaluate Dakera for your agent memory needs? Install the binary in under 30 seconds and run the full LoCoMo benchmark yourself. The benchmark suite is included — no separate download required. See the quickstart guide to begin.