Architecture

Dakera compiles to a single binary with no runtime dependencies. It exposes a REST API (port 3300) and gRPC API (port 50051), both backed by the same engine. The system includes 4-tier caching, on-device inference for embeddings and reranking, background AutoPilot for memory lifecycle management, and a full distributed mode with gossip-based membership, leader election, sharding, and automatic rebalancing.

Retrieval pipeline

System overview

Dakera — System Architecture

API Layer

REST API

HTTP · :3300

gRPC API

Protobuf · :50051

SSE Events

EventBus

Prometheus

/metrics

Middleware — Auth · Rate Limiting · Security Headers · Audit · Validation · Tracing

Engine Layer

HNSW / IVF / SPFresh

ANN search

BM25

Full-text

Hybrid RRF + Rerank

Temporal scoring

Knowledge Graph

Entity graph · BFS

AutoPilot

Dedup + consolidation

Storage Layer

L1 · In-memory LRU

Hot cache

L1.5 · Redis

Distributed cache

L2 · RocksDB

Disk · compressed

L3 · S3 / MinIO

Cold / archival

Inference Layer

Embeddings

MiniLM / BGE / E5

Reranker

bge-reranker-base

Entity Extraction

GLiNER + rule-based

ONNX Runtime — all models loaded at startup · on-device, zero external API calls

Memory lifecycle

Every memory in Dakera follows a defined lifecycle from creation through potential archival or deletion. Understanding this flow is key to tuning retention, decay, and storage costs.

Stage	Location	What happens
Store	L1 + L2	Memory created with initial importance. Embedded, indexed, entities extracted.
Active	L1	Frequently recalled. Each access boosts importance and resets decay timer.
Decaying	L2	No recent access. Importance decreases per decay strategy (exponential, linear, or step).
Archived	L3 (S3)	Below warm threshold. Moved to cold storage. Still retrievable but with higher latency.
Forgotten	—	Importance below minimum threshold, TTL expired, or explicitly deleted via API.

4-tier caching

Dakera implements a four-level cache hierarchy that balances latency and durability. Each tier handles a different access pattern:

Tier	Backend	Latency	Purpose
L1	In-memory LRU	<1ms	Hot cache for frequently accessed memories. Bounded by configurable max entries.
L1.5	Redis	~1ms	Distributed cache in multi-node deployments. Shared across the cluster.
L2	RocksDB	~5ms	Persistent disk storage with compression. Primary durable store.
L3	S3 / MinIO	~50ms	Cold/archival tier. Stores decayed or infrequently accessed memories.

Reads check L1 → L1.5 → L2 → L3, promoting on hit. Writes go to L1 + L2 synchronously, with async replication to L1.5 and L3.

Vector indexes

Dakera supports four vector index types, selectable per namespace:

Index	Strengths	Best for
HNSW	Sub-10ms at millions of vectors, tunable recall/speed	General-purpose (default)
IVF	Lower memory, good for high-dimensional data	Large datasets with limited RAM
SPFresh	Write-optimized, maintains recall under heavy inserts	High-throughput streaming workloads
Flat	Exact nearest-neighbor, no approximation	Small namespaces (<10K vectors), ground-truth evaluation

All indexes use SIMD-accelerated distance functions (cosine, L2, inner product). Index configuration is set per namespace via the REST API or CLI.

Knowledge graph

Dakera maintains a persistent entity graph alongside vector storage. Entities are extracted automatically (or via API) and linked to memories. The graph supports four edge types:

Edge type	Meaning
`RelatedTo`	Semantic similarity between memories
`SharesEntity`	Two memories mention the same named entity
`Precedes`	Temporal ordering — memory A happened before B
`LinkedBy`	Explicit user-created link via API

Use the knowledge graph for multi-hop reasoning, entity-centric retrieval, and cross-agent network visualization. Query via dk knowledge CLI or the KG API endpoints.

Event bus & SSE

Dakera includes a built-in event bus that streams real-time notifications over Server-Sent Events (SSE). Subscribe to memory lifecycle events — store, recall, forget, decay — filtered by namespace and agent. Useful for building dashboards, audit UIs, and agent coordination pipelines.

# Subscribe to events for a specific agent
curl -N -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  "http://localhost:3300/v1/events/stream?agent_id=my-agent"

AutoPilot

AutoPilot runs as a background task that automatically manages memory lifecycle:

Deduplication — detects near-duplicate memories (cosine similarity ≥0.93) and merges them, preserving the highest-importance version.
Consolidation — clusters related low-importance memories using DBSCAN and produces summary memories, reducing noise while preserving knowledge.
Decay enforcement — applies the configured decay strategy to age out stale memories according to their half-life and access patterns.

Configure via DAKERA_AUTOPILOT_DEDUP_INTERVAL_HOURS (default: 1h) and DAKERA_AUTOPILOT_DEDUP_THRESHOLD (default: 0.93). Trigger manually via the /admin/autopilot/trigger endpoint.

Retrieval pipeline

Every recall request flows through an 8-step pipeline:

POST /v1/memories/recall

Query classification

ML router categorizes: factual, multi-hop, temporal, comparison

Embedding

ONNX model embeds query on-device

Vector search

ANN retrieval via HNSW/IVF/SPFresh index

BM25 search

Full-text keyword match with per-namespace index

Reciprocal Rank Fusion

Merge vector + BM25 results into a unified ranking

Temporal scoring

Apply decay weights and recency boost

Cross-encoder reranking

bge-reranker-base scores top candidates for precision

Filter & return

Apply metadata filters, enforce top_k, return results

← MemoryRecallResponse { memories, total, scores }

gRPC API

Dakera exposes a gRPC API on port 50051 alongside the REST API. Both APIs access the same underlying engine — choose based on your use case:

Feature	REST API	gRPC API
Best for	Web clients, quick integration, debugging	High-throughput services, microservices, streaming
Latency	~2-5ms overhead (HTTP/JSON)	~0.5-1ms overhead (HTTP/2, protobuf)
Streaming	SSE (server-sent events)	Bidirectional streaming
Type safety	OpenAPI spec available	Strongly typed via proto definitions
Browser support	Native	Requires grpc-web proxy

# Enable gRPC (enabled by default)
DAKERA_GRPC_ENABLED=true
DAKERA_GRPC_PORT=50051

# For mTLS on the gRPC port, configure at the reverse proxy layer

Backup & WAL

Dakera uses RocksDB's write-ahead log (WAL) for crash recovery. Every write is durably logged before acknowledgment, ensuring no data loss on unexpected shutdown. On restart, the WAL is replayed automatically to restore the last consistent state.

For point-in-time backups, use the admin API (see Deployment → Backup & Restore). Backups include all memories, indexes, namespaces, and configuration — everything needed for a full restore.

Import & export

Bulk data can be moved in and out of Dakera via the memory import/export endpoints. Both use streaming JSON for efficient handling of large datasets.

# Export all memories from a namespace
curl http://localhost:3300/admin/memories/export?namespace=my-ns \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  -o memories.jsonl

# Import memories from JSONL file
curl -X POST http://localhost:3300/admin/memories/import \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @memories.jsonl

Compaction

RocksDB performs automatic compaction in the background to merge sorted runs, reclaim deleted space, and maintain read performance. Dakera also exposes a manual compaction endpoint for maintenance windows:

# Trigger manual compaction on a namespace
curl -X POST http://localhost:3300/admin/namespaces/my-ns/optimize \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY"
# {"status":"completed","duration_ms":1234,"freed_bytes":52428800}

When to compact manually: After bulk deletions or large-scale AutoPilot deduplication runs. Automatic compaction handles normal workloads — manual compaction is for reclaiming space immediately.

Distributed architecture

In cluster mode, Dakera distributes data across nodes using consistent hashing. Each memory is assigned to a shard based on its namespace and ID, and shards are mapped to nodes on a virtual ring.

Component	Mechanism	Purpose
Membership	SWIM gossip protocol	Nodes discover and monitor each other's health. Failure detection via probe → suspect → dead lifecycle.
Leader election	Lease-based with fencing tokens	One leader coordinates shard assignments and rebalancing. Monotonic tokens prevent stale leaders from acting.
Sharding	Consistent hashing (virtual nodes)	Data distributed evenly. Adding/removing nodes only migrates ~1/N of data.
Replication	Configurable replication factor	Shard replicas on multiple nodes for durability. Eventual consistency with gossip-driven convergence.
Rebalancing	Automatic on membership change	Leader detects node join/leave and redistributes shards. Zero-downtime migration.

See High Availability for cluster setup, failure modes, and operational procedures.

Encryption at rest

All memory content can be encrypted with AES-256-GCM authenticated encryption. Set DAKERA_ENCRYPTION_KEY to enable. Passphrases are derived via PBKDF2-HMAC-SHA256 with 100,000 iterations. Key rotation re-encrypts all memories atomically via /admin/encryption/rotate.

Filter expressions

Metadata filters can be applied to any recall, vector query, or batch operation.

{
  "filter": {
    "$and": [
      { "importance": { "$gt": 0.7 } },
      { "tags": { "$in": ["decision", "blocker"] } }
    ]
  }
}

Supported operators: $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin, $and, $or, $not, $exists, $regex, $contains, $icontains, $startsWith, $endsWith, $arrayContains, $arrayContainsAll, $arrayContainsAny.