Dakera AI DAKERA AI
Home Blog Docs GitHub Early Access
Back to blog

Introducing Dakera: Production Memory Infrastructure for AI Agents

Today we're opening early access to Dakera — a single Rust binary that gives your AI agents persistent, semantically-searchable memory with no external services required.


For the past year, every production AI agent team we've talked to has hit the same wall: their agents are stateless. Every session starts from zero. The agent that helped a user debug their deployment on Tuesday has no idea that conversation ever happened by Wednesday. Interactions accumulate, context evaporates.

The workarounds are painful. Some teams bolt on Postgres with pgvector. Others stand up Pinecone and manage another API key. A few try Redis. Most end up with a fragile four-service stack that costs more to operate than the agent it supports.

We built Dakera to fix this.

<10ms
p99 query latency
83
built-in MCP tools
1
binary, zero deps
v0.11
current release

What Dakera is

Dakera is an agent memory server. You run one binary — or pull one Docker image — and your agents get:

The whole thing ships as a single Rust binary. No Python runtime, no external database, no embedding API, no graph service. You point it at a data directory, give it an API key, and it starts serving in under a second.

# Pull and run in one step
docker run -d \
  -p 3300:3300 \
  -e DAKERA_API_KEY=your-key \
  -v ./data:/data \
  ghcr.io/dakera-ai/dakera:v0.11.54

# Store a memory
curl -X POST http://localhost:3300/v1/memories \
  -H "Authorization: Bearer your-key" \
  -d '{"agent_id":"my-agent","content":"User prefers Python over TypeScript","importance":0.85}'

# Recall by meaning
curl -X POST http://localhost:3300/v1/memories/recall \
  -H "Authorization: Bearer your-key" \
  -d '{"agent_id":"my-agent","query":"language preferences","top_k":5}'

Why it's built in Rust

Agent memory is on the critical path for every agent invocation. It runs synchronously before your model call: recall relevant context, inject it into the prompt, then generate. If memory retrieval adds 500ms, your users feel it every time.

We wrote Dakera in Rust because latency matters. The p99 query latency — including embedding inference, HNSW traversal, BM25 scoring, importance reranking, and response serialization — is under 10ms on commodity hardware. That's fast enough to be invisible.

Rust also means a single statically-linked binary. There's no Python interpreter to manage, no garbage collector to tune, no JVM warmup. The binary starts, the ONNX model loads, and you're serving in under a second cold.

The memory model

Dakera's memory model was designed around how agents actually operate in production, not around how vector databases work in theory.

Every memory has an importance score between 0 and 1. Importance decays over time using a configurable half-life — but it's restored on access. A memory that's recalled frequently stays important. A memory that's never recalled after three weeks quietly fades. The result is a system that stays sharp without manual curation: frequently-needed context surfaces reliably, and old noise stops polluting recall.

Sessions give you logical groupings. An agent can open a session for a conversation, tag memories to that session, and recall in session-scoped or cross-session mode. For long-running agents operating across hundreds of sessions, cross-agent knowledge graphs let memories link to named entities — so "what did we discuss about Alice last month?" works across session boundaries.

MCP: memory for Claude, Cursor, and Windsurf

Dakera ships with a native MCP server that exposes 83 tools. Add three lines to your Claude Desktop config and your AI assistant gets persistent memory across every conversation — without writing any code:

// ~/.config/claude/claude_desktop_config.json
{
  "mcpServers": {
    "dakera": {
      "command": "dakera-mcp",
      "env": {
        "DAKERA_URL": "http://localhost:3300",
        "DAKERA_API_KEY": "your-key"
      }
    }
  }
}

Claude will automatically store important context, recall relevant memories before responding, and maintain a persistent knowledge graph about your projects, preferences, and work — across every session, forever.

Native SDKs for Python, TypeScript, Go, and Rust

For developers building agents programmatically, we ship idiomatic SDKs for all four major agent languages:

# Python — pip install dakera
from dakera import DakeraClient

client = DakeraClient(url="http://localhost:3300", api_key="your-key")

# Store memory with importance score
client.memories.store(
    agent_id="my-agent",
    content="Customer is on the Pro plan, billing contact is [email protected]",
    importance=0.9
)

# Hybrid recall (vector + BM25)
results = client.memories.recall(
    agent_id="my-agent",
    query="billing and account details",
    top_k=5
)
for mem in results:
    print(mem.content, mem.score)

Production-ready from day one

Dakera isn't a prototype. It's been running in production as the memory layer for our own autonomous agent systems since v0.4.0. The current v0.11.54 includes:

Early access program — We're admitting a small number of teams now to work directly with us during the early access period. Early access gives you a direct line to the team, priority onboarding, and input into the roadmap. Early access members receive founder pricing locked in before public launch.

What's next

The roadmap is driven by what production agent teams actually need. We're actively working on higher LoCoMo benchmark scores across all recall categories, and deeper integrations with popular agent frameworks.

We'll be publishing technical deep-dives on this blog as we build — starting with how the hybrid retrieval engine works and why naive vector search isn't enough for production agents.

Get early access to Dakera

Deploy persistent memory for your agents in under a minute. Early access is limited — join the waitlist now.