For the past year, every production AI agent team we've talked to has hit the same wall: their agents are stateless. Every session starts from zero. The agent that helped a user debug their deployment on Tuesday has no idea that conversation ever happened by Wednesday. Interactions accumulate, context evaporates.
The workarounds are painful. Some teams bolt on Postgres with pgvector. Others stand up Pinecone and manage another API key. A few try Redis. Most end up with a fragile four-service stack that costs more to operate than the agent it supports.
We built Dakera to fix this.
What Dakera is
Dakera is an agent memory server. You run one binary — or pull one Docker image — and your agents get:
- Persistent memory storage — semantically indexed, decay-weighted, cross-session
- Hybrid retrieval — HNSW vector search combined with BM25 full-text for relevant recall
- Knowledge graphs — entity-linked memory for multi-hop reasoning across sessions
- Built-in embeddings — MiniLM, BGE, and E5 models run on-device via the Candle runtime, no external API calls
- Session management — logical groupings that your agents can open, append to, and close
- 83 MCP tools — plug directly into Claude Desktop, Cursor, Windsurf, or any MCP-compatible environment
The whole thing ships as a single Rust binary. No Python runtime, no external database, no embedding API, no graph service. You point it at a data directory, give it an API key, and it starts serving in under a second.
# Pull and run in one step
docker run -d \
-p 3300:3300 \
-e DAKERA_API_KEY=your-key \
-v ./data:/data \
ghcr.io/dakera-ai/dakera:v0.11.54
# Store a memory
curl -X POST http://localhost:3300/v1/memories \
-H "Authorization: Bearer your-key" \
-d '{"agent_id":"my-agent","content":"User prefers Python over TypeScript","importance":0.85}'
# Recall by meaning
curl -X POST http://localhost:3300/v1/memories/recall \
-H "Authorization: Bearer your-key" \
-d '{"agent_id":"my-agent","query":"language preferences","top_k":5}'
Why it's built in Rust
Agent memory is on the critical path for every agent invocation. It runs synchronously before your model call: recall relevant context, inject it into the prompt, then generate. If memory retrieval adds 500ms, your users feel it every time.
We wrote Dakera in Rust because latency matters. The p99 query latency — including embedding inference, HNSW traversal, BM25 scoring, importance reranking, and response serialization — is under 10ms on commodity hardware. That's fast enough to be invisible.
Rust also means a single statically-linked binary. There's no Python interpreter to manage, no garbage collector to tune, no JVM warmup. The binary starts, the ONNX model loads, and you're serving in under a second cold.
The memory model
Dakera's memory model was designed around how agents actually operate in production, not around how vector databases work in theory.
Every memory has an importance score between 0 and 1. Importance decays over time using a configurable half-life — but it's restored on access. A memory that's recalled frequently stays important. A memory that's never recalled after three weeks quietly fades. The result is a system that stays sharp without manual curation: frequently-needed context surfaces reliably, and old noise stops polluting recall.
Sessions give you logical groupings. An agent can open a session for a conversation, tag memories to that session, and recall in session-scoped or cross-session mode. For long-running agents operating across hundreds of sessions, cross-agent knowledge graphs let memories link to named entities — so "what did we discuss about Alice last month?" works across session boundaries.
MCP: memory for Claude, Cursor, and Windsurf
Dakera ships with a native MCP server that exposes 83 tools. Add three lines to your Claude Desktop config and your AI assistant gets persistent memory across every conversation — without writing any code:
// ~/.config/claude/claude_desktop_config.json
{
"mcpServers": {
"dakera": {
"command": "dakera-mcp",
"env": {
"DAKERA_URL": "http://localhost:3300",
"DAKERA_API_KEY": "your-key"
}
}
}
}
Claude will automatically store important context, recall relevant memories before responding, and maintain a persistent knowledge graph about your projects, preferences, and work — across every session, forever.
Native SDKs for Python, TypeScript, Go, and Rust
For developers building agents programmatically, we ship idiomatic SDKs for all four major agent languages:
# Python — pip install dakera
from dakera import DakeraClient
client = DakeraClient(url="http://localhost:3300", api_key="your-key")
# Store memory with importance score
client.memories.store(
agent_id="my-agent",
content="Customer is on the Pro plan, billing contact is [email protected]",
importance=0.9
)
# Hybrid recall (vector + BM25)
results = client.memories.recall(
agent_id="my-agent",
query="billing and account details",
top_k=5
)
for mem in results:
print(mem.content, mem.score)
Production-ready from day one
Dakera isn't a prototype. It's been running in production as the memory layer for our own autonomous agent systems since v0.4.0. The current v0.11.54 includes:
- WAL durability with crash-safe writes
- AES-256-GCM encryption at rest
- Multi-tenant namespacing with per-namespace API keys
- Rate limiting per agent and per key
- Prometheus metrics and OpenTelemetry tracing
- Horizontal clustering with Raft consensus
- Zero-downtime rolling upgrades
Early access program — We're admitting a small number of teams now to work directly with us during the early access period. Early access gives you a direct line to the team, priority onboarding, and input into the roadmap. Early access members receive founder pricing locked in before public launch.
What's next
The roadmap is driven by what production agent teams actually need. We're actively working on higher LoCoMo benchmark scores across all recall categories, and deeper integrations with popular agent frameworks.
We'll be publishing technical deep-dives on this blog as we build — starting with how the hybrid retrieval engine works and why naive vector search isn't enough for production agents.
Get early access to Dakera
Deploy persistent memory for your agents in under a minute. Early access is limited — join the waitlist now.