The MCP memory gap
Most AI agents forget everything when the conversation ends. The Model Context Protocol (MCP) gave agents a way to call external tools — but most MCP memory implementations are thin wrappers: store a string, retrieve a string, done. That works for a shopping list. It breaks down for agents that need to recall user preferences from three weeks ago, reason across dozens of past sessions, or surface a fact from session 12 while answering a question in session 47.
Dakera was built specifically for this problem. The MCP server is not a feature added on top — it is one of the primary interfaces into the same engine that scores 87.6% on the LoCoMo long-context memory benchmark.
83 MCP tools, not 3
Most MCP memory servers expose three operations: remember, recall, forget. Dakera exposes 83, organized into functional groups that cover the full lifecycle of agent memory:
The breadth matters because real agent workloads are not uniform. A coding assistant needs different memory patterns than a customer support agent. A research agent needs cross-session entity linking that a short-lived task agent does not. Dakera's 83 tools let agents express exactly the memory operations they need — not work around a lowest-common-denominator API.
Connecting Claude Desktop in two minutes
Start a Dakera instance (binary or Docker) and add it to your Claude Desktop config:
{
"mcpServers": {
"dakera": {
"command": "dakera",
"args": ["mcp"],
"env": {
"DAKERA_API_KEY": "your-key",
"DAKERA_URL": "http://localhost:7700"
}
}
}
}
Restart Claude Desktop. You now have 83 memory tools available in every conversation. Claude can store anything important, recall context from months ago, and maintain structured knowledge about people, projects, and decisions — automatically.
No embeddings API required. Dakera runs built-in embedding models (MiniLM, BGE, E5) via the Candle runtime. Semantic search works out of the box — no OpenAI API key, no external service, no additional cost.
The same setup works for Cursor and Windsurf
Cursor and Windsurf both support MCP servers. Add Dakera as an MCP server in Cursor's ~/.cursor/mcp.json or Windsurf's MCP configuration, and your coding agent gets persistent memory across projects:
- Code patterns and architectural decisions from past sessions recalled automatically
- User preferences (formatting, naming conventions, framework choices) stored once, applied forever
- Session boundaries do not reset context — Dakera keeps it across restarts
- Multi-project memory with namespace isolation if you want each repo to have its own memory space
What "persistent memory" actually means
When an agent calls dakera_store through MCP, Dakera does more than write a record to disk:
- Embedding — The text is embedded on-device using a built-in transformer model. No API call.
- Dual indexing — The vector goes into an HNSW index for semantic search; the text goes into a BM25 index for keyword search.
- Entity extraction — Named entities (people, projects, dates, organizations) are extracted and linked to the knowledge graph.
- Importance scoring — A configurable importance score (0.0–1.0) controls how long the memory survives and how highly it ranks in recall.
- Decay scheduling — Low-importance memories gradually fade; high-importance memories persist indefinitely. The half-life is configurable per namespace.
When an agent later calls dakera_recall, hybrid search (HNSW + BM25, RRF fusion) surfaces the most relevant memories regardless of how they were phrased — not just exact matches.
Self-hosted means your data stays on your infrastructure
Every memory stored through Dakera's MCP server stays on your infrastructure. There is no call home, no cloud sync, no telemetry of memory content. Pull the Docker image, set your API key, and every memory operation happens entirely within your stack.
This matters for developers and organizations that cannot send conversation content to third-party services — a constraint that makes fully-managed cloud memory vendors a non-starter for many production workloads.
Accuracy: 87.6% on LoCoMo, no LLM post-processing
The LoCoMo benchmark evaluates long-context memory recall across 50 conversation sessions and 1,540 questions — temporal reasoning, multi-hop facts, implicit references, and entity tracking. Dakera scores 87.6% on the full dataset using standard single-pass evaluation with no LLM post-processing step.
Many memory systems use a second LLM call (reranking, synthesis, or "justify" steps) to boost benchmark scores. This improves numbers but adds latency and cost to every recall operation. Dakera's score comes from the retrieval engine alone.
For reference: systems like Hindsight (89.61%) achieve their top score with Gemini-3 Pro as a cross-encoder — a cloud API call on every recall. Their self-hosted OSS-120B variant scores 85.67%, which is below Dakera's 87.6% in standard evaluation.
Full benchmark methodology — How we run the evaluation, what each question category tests, and how to reproduce our results against your own Dakera instance: Dakera on LoCoMo →
Getting started
Dakera is in early access. The MCP server is available to all early access members. To get access:
- Join the waitlist at dakera.ai/#waitlist
- Receive your API key (usually within a few days)
- Pull the binary or Docker image from the private registry
- Add the MCP server config to Claude Desktop, Cursor, or your agent framework
If you are building an agent framework and want to evaluate Dakera for production use, reach out on GitHub — we prioritize production workloads for early access.
More on how the memory engine works: Hybrid Retrieval and Importance Decay →