Best AI Agent Memory Frameworks in 2026: Compared and Ranked
AI agents have a memory problem. Without persistence, every conversation starts from scratch — your agent doesn't know that yesterday's user prefers Python over JavaScript, or that the task it completed last week needs a follow-up. Long context windows help, but they're expensive, slow, and still limited to a single session.
The solution is an external memory layer — a dedicated store that persists what agents learn and surfaces the most relevant information at query time. In 2026, a handful of frameworks compete for this role. They differ significantly in architecture, deployment model, retrieval quality, and operational complexity.
This guide covers the five most widely used options: Dakera, Mem0, Letta, Zep, and Hindsight. We'll compare them on the criteria that matter to production teams: benchmark scores, deployment model, retrieval architecture, and total cost of ownership.
This article is written by the Dakera team. We've tried to be accurate about competitors, but you should verify claims independently. Benchmark scores are from public sources cited inline.
Quick Summary: The Comparison Table
| Framework | LoCoMo Score | Self-Hosted | MCP Support | Rust Binary | Decay Engine | License |
|---|---|---|---|---|---|---|
| Dakera | 87.6% | ✓ Native | ✓ 83 tools | ✓ 44 MB | ✓ Half-life | Open core |
| Mem0 | 91.6%* | ⚠ Cloud + OSS | ⚠ Partial | ✗ Python | ✗ | Apache-2.0 |
| Letta | — | ⚠ Self-host possible | ✗ | ✗ Python | ✗ | Apache-2.0 |
| Zep | — | ⚠ Cloud + OSS | ✗ | ✗ Go | ⚠ Basic | Apache-2.0 (CE) |
| Hindsight | — | ✗ Cloud only | ✗ | ✗ | ✗ | Proprietary |
* Mem0's 91.6% is from their own benchmark run using a different prompt format and with LLM post-processing enabled. Dakera's 87.6% is evaluated without LLM post-processing. Direct comparison requires methodology alignment.
The Frameworks
1. Dakera — Self-Hosted, Rust-Native, MCP-First
Dakera is a single Rust binary that gives AI agents persistent memory via hybrid BM25+HNSW retrieval, a knowledge graph, session management, and configurable importance decay. It runs entirely on your infrastructure and ships a native MCP server with 83 tools for immediate integration with Claude, Cursor, and Windsurf.
Unique moat: Access-weighted importance decay with configurable half-life per namespace. Memories decay like human memory — frequently accessed facts stay sharp, stale context fades. No other framework in this comparison offers this out of the box.
Best for: Teams that need GDPR/HIPAA compliance, air-gapped deployment, or MCP integration with Claude Code. The Rust binary eliminates the Python/Node runtime tax and reduces attack surface to near-zero.
Limitations: Newer project with smaller community than Mem0. Cloud-hosted option not yet available for teams that prefer managed services.
# Self-host Dakera in under 5 minutes
docker run -d \
--name dakera \
-p 3300:3300 \
-e DAKERA_ROOT_API_KEY=my-key \
ghcr.io/dakera-ai/dakera:latest
curl http://localhost:3300/health
# {"status":"healthy","version":"0.11.54"}
2. Mem0 — The Established Leader, Cloud-First
Mem0 is the most widely adopted AI memory framework. Its open-source version (m0) provides Python-based memory management with OpenAI embeddings and a simple store/retrieve API. The commercial hosted version adds team features, dashboards, and a higher benchmark score through LLM post-processing.
Mem0's published 91.6% LoCoMo score makes it the current benchmark leader, though this includes LLM-assisted answer extraction that Dakera excludes by design. For raw recall without post-processing, both systems are competitive.
Best for: Python-first teams that want a quick integration and don't have strict data residency requirements. Mem0's pip install and OpenAI-native workflow is the fastest path to a working prototype.
Limitations: Cloud-first architecture means your memory data goes through Mem0's servers. No MCP server for direct Claude/Cursor integration. Python runtime adds significant overhead vs Rust. No decay engine — all memories are equally weighted indefinitely.
3. Letta (formerly MemGPT) — Agent OS with Built-In Memory
Letta is an agent framework with memory as a first-class primitive. Rather than a standalone memory store, Letta treats memory as part of the agent's architecture — the agent itself manages what to remember and when. This gives Letta unique in-context memory capabilities but at the cost of tighter coupling.
Best for: Teams building with LangChain or LlamaIndex who want memory that's tightly integrated with their agent's reasoning loop. Letta shines when the agent needs to actively decide what to remember.
Limitations: Not a standalone memory server — you're adopting the Letta agent framework. No MCP support. No meaningful benchmark score on LoCoMo. Python-only runtime.
4. Zep — Temporal Graph for Conversation Memory
Zep specializes in conversation memory and user-fact extraction. It builds a temporal knowledge graph from conversations, making it good at tracking how user preferences and facts change over time. The community edition is Go-based and self-hostable; the commercial edition adds a managed cloud.
Best for: Customer-facing agents that need to remember what users said across many conversations. Zep's conversation-centric model works well for chat applications.
Limitations: Architecture is heavily conversation-oriented — not designed for multi-agent or cross-agent memory sharing. No MCP server. Limited benchmarking transparency.
5. Hindsight — Managed Cloud Only
Hindsight is a commercial memory layer for enterprise AI agents. It offers a managed API with no self-hosting option. For teams with strict data governance requirements, the cloud-only model is a dealbreaker. No public benchmark results are available.
Best for: Enterprise teams with existing cloud vendor relationships who want a fully managed service and don't have data residency constraints.
Limitations: No self-hosting. No public benchmark scores. Proprietary and expensive at scale. No MCP support.
Decision Tree: Which Framework Should You Use?
Choose based on your constraints
Retrieval Architecture Deep Dive
The most important dimension most comparisons skip is how each framework retrieves memories at query time. This directly determines recall quality on real workloads.
Hybrid BM25 + HNSW Vector Search (Dakera)
Dakera runs both keyword (BM25) and semantic (HNSW vector) search in a single round-trip and fuses the results. This matters because neither approach alone is sufficient:
- Pure vector search misses exact keyword matches (names, error codes, version numbers)
- Pure BM25 misses semantic equivalents ("user is unhappy" vs "user expressed frustration")
The hybrid approach explains why Dakera scores 73.9% on LoCoMo's Cat3 (temporal inference) — the hardest category — while remaining fast enough for production use.
Vector + LLM Extraction (Mem0)
Mem0 stores user facts via LLM extraction at write time. This compresses memory into facts ("user prefers dark mode") rather than raw episodic content. At recall time, it combines vector similarity with the extracted fact store. This works well for explicit preferences but loses nuance in complex multi-turn interactions.
Conversation Graph (Zep)
Zep builds a temporal knowledge graph from conversations, tracking when facts change over time. This is highly effective for chatbot scenarios where you need to know "what did the user say about X in their most recent session" but less flexible for multi-agent episodic memory.
Importance Decay: The Differentiator Nobody Talks About
Every memory system faces the same fundamental problem: as memories accumulate, the signal-to-noise ratio degrades. A memory from 6 months ago about a user's project requirements is less relevant than what they said yesterday — but both have the same vector embedding, so pure semantic search can't distinguish them.
Dakera's access-weighted half-life decay solves this:
- Each memory has a base importance score (0.0–1.0) set at write time
- Memories decay exponentially based on time since last access, with a configurable half-life per namespace
- Accessing a memory resets its decay clock — frequently used facts stay sharp
- Decay rate is per-namespace, so you can set different half-lives for user preferences (slow decay) vs working context (fast decay)
No other framework in this comparison offers this architecture. Mem0, Letta, and Zep treat all stored memories as equally weighted regardless of age or access frequency.
MCP Integration: The 2026 Developer Interface
The Model Context Protocol has become the standard integration layer for AI tools in 2026. Claude Desktop, Claude Code, Cursor, and Windsurf all support MCP natively. An MCP server turns Dakera's REST API into a set of tools your AI assistant can call directly — no code changes required.
Only Dakera ships a native MCP server among the frameworks compared here. Mem0, Letta, Zep, and Hindsight require custom code to integrate.
With Dakera's MCP server, your Claude Code session gets:
dakera_store— save what Claude learns about your codebasedakera_recall— retrieve relevant context before respondingdakera_session_start/end— group memories by work sessiondakera_hybrid_search— semantic + keyword search in one call- 79 additional tools for graph, vector, namespace, and decay operations
Deployment and Operations
For production teams, operational simplicity matters as much as feature completeness. Here's how the frameworks compare on the dimensions that drive operational cost:
| Framework | Runtime | External Deps | Docker Image Size | Kubernetes |
|---|---|---|---|---|
| Dakera | Rust binary | MinIO (or S3) | ~100 MB | ✓ Helm chart |
| Mem0 (OSS) | Python 3.10+ | Qdrant + Redis + Postgres | ~1.2 GB | ⚠ Manual |
| Letta | Python 3.10+ | Postgres + vector DB | ~900 MB | ⚠ Manual |
| Zep (CE) | Go | Postgres + Neo4j | ~300 MB | ⚠ Manual |
Dakera's single-binary architecture with S3-compatible object storage means you're running one process with one external dependency. Mem0's self-hosted stack requires Qdrant, Redis, and Postgres — three separate services to operate, monitor, and scale.
Benchmark Methodology Note
LoCoMo (Long-Context Memory) is the standard benchmark for agent memory evaluation. It tests 1,540 questions across four categories:
- Cat1 — Single-hop factual recall (Dakera: 86.9%)
- Cat2 — Multi-hop reasoning across memories (Dakera: 85.4%)
- Cat3 — Temporal inference ("what changed between X and Y?") (Dakera: 73.9%)
- Cat4 — Counting and aggregation (Dakera: 91.0%)
Mem0's published 91.6% overall uses LLM post-processing to clean answers before evaluation. Dakera's 87.6% is evaluated with exact string matching only. Benchmark comparisons between systems must account for methodology differences.
Dakera's benchmark is fully reproducible. The benchmark harness and dataset (LoCoMo standard) are available on GitHub. See full methodology →
The Bottom Line
There is no universally best AI agent memory framework — the right choice depends on your constraints:
- For compliance, self-hosting, and MCP integration: Dakera is the only option that checks all three boxes
- For Python prototyping with managed hosting: Mem0 is faster to get started
- For LangChain/LlamaIndex ecosystem: Letta integrates natively
- For conversation-centric chatbot memory: Zep's temporal graph is well-suited
If you're building agents that will run in production — especially agents that handle sensitive user data, need to run offline, or use Claude/Cursor/Windsurf — Dakera's architecture is built for that from the ground up.
Dakera is free to self-host. Get started in under 5 minutes with Docker or Docker Compose. Read the quickstart guide →