12 min read

Best AI Agent Memory Frameworks in 2026: Compared and Ranked

AI agents have a memory problem. Without persistence, every conversation starts from scratch — your agent doesn't know that yesterday's user prefers Python over JavaScript, or that the task it completed last week needs a follow-up. Long context windows help, but they're expensive, slow, and still limited to a single session.

The solution is an external memory layer — a dedicated store that persists what agents learn and surfaces the most relevant information at query time. In 2026, a handful of frameworks compete for this role. They differ significantly in architecture, deployment model, retrieval quality, and operational complexity.

This guide covers the five most widely used options: Dakera, Mem0, Letta, Zep, and Hindsight. We'll compare them on the criteria that matter to production teams: benchmark scores, deployment model, retrieval architecture, and total cost of ownership.

Disclosure

This article is written by the Dakera team. We've tried to be accurate about competitors, but you should verify claims independently. Benchmark scores are from public sources cited inline.


Quick Summary: The Comparison Table

Framework LoCoMo Score Self-Hosted MCP Support Rust Binary Decay Engine License
Dakera 87.6% ✓ Native ✓ 83 tools ✓ 44 MB ✓ Half-life Open core
Mem0 91.6%* ⚠ Cloud + OSS ⚠ Partial ✗ Python Apache-2.0
Letta ⚠ Self-host possible ✗ Python Apache-2.0
Zep ⚠ Cloud + OSS ✗ Go ⚠ Basic Apache-2.0 (CE)
Hindsight ✗ Cloud only Proprietary

* Mem0's 91.6% is from their own benchmark run using a different prompt format and with LLM post-processing enabled. Dakera's 87.6% is evaluated without LLM post-processing. Direct comparison requires methodology alignment.


The Frameworks

1. Dakera — Self-Hosted, Rust-Native, MCP-First

Dakera is a single Rust binary that gives AI agents persistent memory via hybrid BM25+HNSW retrieval, a knowledge graph, session management, and configurable importance decay. It runs entirely on your infrastructure and ships a native MCP server with 83 tools for immediate integration with Claude, Cursor, and Windsurf.

87.6%
LoCoMo 1,540Q Overall
73.9%
Cat3 Temporal Inference
44 MB
Binary Size
83
MCP Tools

Unique moat: Access-weighted importance decay with configurable half-life per namespace. Memories decay like human memory — frequently accessed facts stay sharp, stale context fades. No other framework in this comparison offers this out of the box.

Best for: Teams that need GDPR/HIPAA compliance, air-gapped deployment, or MCP integration with Claude Code. The Rust binary eliminates the Python/Node runtime tax and reduces attack surface to near-zero.

Limitations: Newer project with smaller community than Mem0. Cloud-hosted option not yet available for teams that prefer managed services.

# Self-host Dakera in under 5 minutes
docker run -d \
  --name dakera \
  -p 3300:3300 \
  -e DAKERA_ROOT_API_KEY=my-key \
  ghcr.io/dakera-ai/dakera:latest

curl http://localhost:3300/health
# {"status":"healthy","version":"0.11.54"}

2. Mem0 — The Established Leader, Cloud-First

Mem0 is the most widely adopted AI memory framework. Its open-source version (m0) provides Python-based memory management with OpenAI embeddings and a simple store/retrieve API. The commercial hosted version adds team features, dashboards, and a higher benchmark score through LLM post-processing.

Mem0's published 91.6% LoCoMo score makes it the current benchmark leader, though this includes LLM-assisted answer extraction that Dakera excludes by design. For raw recall without post-processing, both systems are competitive.

Best for: Python-first teams that want a quick integration and don't have strict data residency requirements. Mem0's pip install and OpenAI-native workflow is the fastest path to a working prototype.

Limitations: Cloud-first architecture means your memory data goes through Mem0's servers. No MCP server for direct Claude/Cursor integration. Python runtime adds significant overhead vs Rust. No decay engine — all memories are equally weighted indefinitely.

3. Letta (formerly MemGPT) — Agent OS with Built-In Memory

Letta is an agent framework with memory as a first-class primitive. Rather than a standalone memory store, Letta treats memory as part of the agent's architecture — the agent itself manages what to remember and when. This gives Letta unique in-context memory capabilities but at the cost of tighter coupling.

Best for: Teams building with LangChain or LlamaIndex who want memory that's tightly integrated with their agent's reasoning loop. Letta shines when the agent needs to actively decide what to remember.

Limitations: Not a standalone memory server — you're adopting the Letta agent framework. No MCP support. No meaningful benchmark score on LoCoMo. Python-only runtime.

4. Zep — Temporal Graph for Conversation Memory

Zep specializes in conversation memory and user-fact extraction. It builds a temporal knowledge graph from conversations, making it good at tracking how user preferences and facts change over time. The community edition is Go-based and self-hostable; the commercial edition adds a managed cloud.

Best for: Customer-facing agents that need to remember what users said across many conversations. Zep's conversation-centric model works well for chat applications.

Limitations: Architecture is heavily conversation-oriented — not designed for multi-agent or cross-agent memory sharing. No MCP server. Limited benchmarking transparency.

5. Hindsight — Managed Cloud Only

Hindsight is a commercial memory layer for enterprise AI agents. It offers a managed API with no self-hosting option. For teams with strict data governance requirements, the cloud-only model is a dealbreaker. No public benchmark results are available.

Best for: Enterprise teams with existing cloud vendor relationships who want a fully managed service and don't have data residency constraints.

Limitations: No self-hosting. No public benchmark scores. Proprietary and expensive at scale. No MCP support.


Decision Tree: Which Framework Should You Use?

Choose based on your constraints

IF
You need GDPR / HIPAA compliance, air-gapped deployment, or data must not leave your servers → Dakera (only option with true self-hosting and zero cloud dependency)
IF
You're using Claude, Cursor, or Windsurf and want persistent memory without code changes → Dakera (only option with a native MCP server and 83 MCP tools)
IF
You want the fastest Python prototype and don't care about data residency → Mem0 (best ecosystem, fastest setup for Python)
IF
You're building a multi-agent system where agents share memory → Dakera (namespace isolation + cross-agent recall built-in)
IF
Memory importance should decay over time (stale facts shouldn't crowd out recent ones) → Dakera (only option with access-weighted half-life decay)
IF
You're building a customer-facing chatbot and want conversation-centric memory → Zep (temporal graph optimized for conversations)
IF
You're using LangChain or LlamaIndex and want tight agent-framework integration → Letta (native framework memory)

Retrieval Architecture Deep Dive

The most important dimension most comparisons skip is how each framework retrieves memories at query time. This directly determines recall quality on real workloads.

Hybrid BM25 + HNSW Vector Search (Dakera)

Dakera runs both keyword (BM25) and semantic (HNSW vector) search in a single round-trip and fuses the results. This matters because neither approach alone is sufficient:

The hybrid approach explains why Dakera scores 73.9% on LoCoMo's Cat3 (temporal inference) — the hardest category — while remaining fast enough for production use.

Vector + LLM Extraction (Mem0)

Mem0 stores user facts via LLM extraction at write time. This compresses memory into facts ("user prefers dark mode") rather than raw episodic content. At recall time, it combines vector similarity with the extracted fact store. This works well for explicit preferences but loses nuance in complex multi-turn interactions.

Conversation Graph (Zep)

Zep builds a temporal knowledge graph from conversations, tracking when facts change over time. This is highly effective for chatbot scenarios where you need to know "what did the user say about X in their most recent session" but less flexible for multi-agent episodic memory.


Importance Decay: The Differentiator Nobody Talks About

Every memory system faces the same fundamental problem: as memories accumulate, the signal-to-noise ratio degrades. A memory from 6 months ago about a user's project requirements is less relevant than what they said yesterday — but both have the same vector embedding, so pure semantic search can't distinguish them.

Dakera's access-weighted half-life decay solves this:

No other framework in this comparison offers this architecture. Mem0, Letta, and Zep treat all stored memories as equally weighted regardless of age or access frequency.


MCP Integration: The 2026 Developer Interface

The Model Context Protocol has become the standard integration layer for AI tools in 2026. Claude Desktop, Claude Code, Cursor, and Windsurf all support MCP natively. An MCP server turns Dakera's REST API into a set of tools your AI assistant can call directly — no code changes required.

Only Dakera ships a native MCP server among the frameworks compared here. Mem0, Letta, Zep, and Hindsight require custom code to integrate.

With Dakera's MCP server, your Claude Code session gets:


Deployment and Operations

For production teams, operational simplicity matters as much as feature completeness. Here's how the frameworks compare on the dimensions that drive operational cost:

Framework Runtime External Deps Docker Image Size Kubernetes
Dakera Rust binary MinIO (or S3) ~100 MB ✓ Helm chart
Mem0 (OSS) Python 3.10+ Qdrant + Redis + Postgres ~1.2 GB ⚠ Manual
Letta Python 3.10+ Postgres + vector DB ~900 MB ⚠ Manual
Zep (CE) Go Postgres + Neo4j ~300 MB ⚠ Manual

Dakera's single-binary architecture with S3-compatible object storage means you're running one process with one external dependency. Mem0's self-hosted stack requires Qdrant, Redis, and Postgres — three separate services to operate, monitor, and scale.


Benchmark Methodology Note

LoCoMo (Long-Context Memory) is the standard benchmark for agent memory evaluation. It tests 1,540 questions across four categories:

Mem0's published 91.6% overall uses LLM post-processing to clean answers before evaluation. Dakera's 87.6% is evaluated with exact string matching only. Benchmark comparisons between systems must account for methodology differences.

Dakera's benchmark is fully reproducible. The benchmark harness and dataset (LoCoMo standard) are available on GitHub. See full methodology →


The Bottom Line

There is no universally best AI agent memory framework — the right choice depends on your constraints:

If you're building agents that will run in production — especially agents that handle sensitive user data, need to run offline, or use Claude/Cursor/Windsurf — Dakera's architecture is built for that from the ground up.

Try Dakera

Dakera is free to self-host. Get started in under 5 minutes with Docker or Docker Compose. Read the quickstart guide →