Dakera vs Letta (MemGPT) — what is the difference?

Letta uses LLM calls to manage memory (deciding what to keep or forget). Dakera uses deterministic algorithms (BM25 + HNSW + decay) — no LLM dependency, predictable costs, consistent results.

Is Dakera more reliable than Letta for production?

Dakera uses deterministic retrieval — same query always returns same results. Letta relies on LLM judgment which can be non-deterministic and adds latency per operation.

Does Dakera require LLM calls like Letta?

No. Dakera runs entirely on-device with ONNX inference for embeddings and GLiNER for entities. Zero external LLM API calls for any core operation.

COMPARE

Dakera vs Letta (formerly MemGPT)

Letta (MemGPT) and Dakera approach agent memory from fundamentally different angles. Letta uses LLMs to actively manage memory — the model decides what to remember and forget. Dakera is a dedicated retrieval engine with deterministic memory operations. One uses AI to manage memory; the other is infrastructure that AI agents call into.

Feature Comparison

Feature	Dakera	Letta (MemGPT)
Category	Memory Retrieval Engine	Agent Framework with Memory
Memory Management	Deterministic (algorithmic decay, scoring)	LLM-powered (model decides what to store/forget)
Language	Rust (single binary)	Python
Retrieval	Hybrid HNSW + BM25, RRF, cross-encoder	LLM-directed search over archival memory
Memory Tiers	Flat store with decay + importance scoring	Core memory (system prompt) + archival (vector) + recall (conversation)
Context Management	Not applicable (stores/retrieves memories)	Virtual context management (OS-like paging)
Agent Framework	No (memory infrastructure only)	Yes (full agent with tools, personas, memory)
LLM Dependency	None for core ops (embeddings are local ONNX)	Requires LLM for all memory operations
Knowledge Graph	Entity extraction (GLiNER), BFS	Not built-in
Memory Decay	6 configurable strategies	LLM-decided (non-deterministic)
MCP Tools	83 tools	Not available (own tool system)
SDKs	Python, TypeScript, Go, Rust	Python
Cost per Query	~0 (local inference only)	LLM API cost per memory operation
License	MIT SDKs, proprietary server	Apache 2.0

Architecture Differences

Dakera

A memory storage and retrieval engine. Your agent stores memories via API, and retrieves them via hybrid search with reranking. Dakera does not make decisions about what to remember — your agent does. Memory decay is algorithmic and deterministic: configurable strategies (time-based, access-count, importance scoring) manage memory lifecycle predictably. The engine runs entirely on local inference (ONNX) with no LLM calls.

Letta (MemGPT)

An agent framework that treats memory like an operating system. Inspired by virtual memory in OS design, Letta uses an LLM to actively manage what goes into "core memory" (the system prompt), "archival memory" (long-term vector storage), and "recall memory" (recent conversation history). The LLM decides when to page information in and out of context. This creates a self-managing memory system, but every memory operation costs an LLM API call and is inherently non-deterministic.

Deployment Model

Aspect	Dakera	Letta
Setup	Docker pull + run (single binary)	pip install + LLM API key
Runtime Dependencies	None (self-contained ONNX)	LLM API (OpenAI, Anthropic, etc.)
Latency	~5-50ms per query (local inference)	~500-2000ms per memory op (LLM round-trip)
Cost Model	Fixed (your infra only)	Variable (LLM tokens per operation)
Determinism	Deterministic (same query = same results)	Non-deterministic (LLM may vary)
Scale	Handles millions of memories per namespace	Limited by LLM context and API throughput

Pricing Comparison

Aspect	Dakera	Letta
Software	Free (self-hosted)	Free (Apache 2.0)
Per Memory Operation	~$0 (local ONNX inference)	~$0.001-0.01 (LLM API call per operation)
1M Memory Ops/month	~$10-30 (VPS cost only)	~$1,000-10,000 (LLM API costs)
Cloud/Enterprise	Coming soon	Letta Cloud (managed platform)

The cost difference is significant at scale. Every memory operation in Letta requires an LLM inference call, while Dakera's operations use only local ONNX models (embedding + reranking). For high-volume agent memory workloads, this difference is orders of magnitude.

When to Choose

Choose Letta if:

You want a complete agent framework (not just memory infrastructure)
LLM-powered memory management (the model decides what matters) fits your design
You like the "virtual memory / OS" metaphor for context management
Your agent has low memory operation volume (cost stays manageable)
You want an open-source (Apache 2.0) agent framework with memory built-in
Non-deterministic memory behavior is acceptable for your use case
You are building a Python-only stack

Choose Dakera if:

You need deterministic, predictable memory behavior (same query = same results)
Cost matters at scale — you cannot afford LLM calls for every memory operation
Low latency is critical (5-50ms vs 500-2000ms per operation)
You already have an agent framework and need memory infrastructure to plug into it
You need hybrid retrieval with BM25 + vector + cross-encoder reranking
Knowledge graphs, memory decay strategies, and session management are requirements
You need SDKs beyond Python (TypeScript, Go, Rust)
MCP integration for IDE-based workflows is important

Verdict

Dakera provides deterministic memory infrastructure — hybrid BM25 + HNSW vector search with cross-encoder reranking at 5-50ms latency, 6 memory decay strategies, knowledge graphs, and 83 MCP tools, all in a self-hosted 44 MB Rust binary scoring 87.6% on LoCoMo with zero LLM API costs for memory operations. Letta takes an innovative approach where the LLM itself manages memory, enabling creative and context-aware memory decisions that adapt dynamically to conversation flow — genuinely powerful for use cases that benefit from reasoning about what to remember. Choose Dakera when you need fast, deterministic, cost-effective memory retrieval as infrastructure for your agent stack. Choose Letta when you want the LLM to actively reason about memory management and can accommodate the additional API costs and latency.

Try Dakera Free

Deterministic memory retrieval at 5-50ms latency. No LLM API costs for memory operations.

Get Started