Architecture

Dakera compiles to a single binary with no runtime dependencies. It exposes a REST API (port 3300) and gRPC API (port 50051), both backed by the same engine. The system includes 4-tier caching, on-device inference for embeddings and reranking, background AutoPilot for memory lifecycle management, and a full distributed mode with gossip-based membership, leader election, sharding, and automatic rebalancing.

Retrieval pipeline

Query Classify Embed Vector + BM25 parallel search RRF Rerank ML classifier routes query → parallel vector + full-text → fuse → cross-encoder rerank → results

System overview

Dakera — System Architecture
API Layer
REST API
HTTP · :3300
gRPC API
Protobuf · :50051
SSE Events
EventBus
Prometheus
/metrics
Middleware — Auth · Rate Limiting · Security Headers · Audit · Validation · Tracing
Engine Layer
HNSW / IVF / SPFresh
ANN search
BM25
Full-text
Hybrid RRF + Rerank
Temporal scoring
Knowledge Graph
Entity graph · BFS
AutoPilot
Dedup + consolidation
Storage Layer
L1 · In-memory LRU
Hot cache
L1.5 · Redis
Distributed cache
L2 · RocksDB
Disk · compressed
L3 · S3 / MinIO
Cold / archival
Inference Layer
Embeddings
MiniLM / BGE / E5
Reranker
bge-reranker-base
Entity Extraction
GLiNER + rule-based
ONNX Runtime — all models loaded at startup  ·  on-device, zero external API calls

Memory lifecycle

Every memory in Dakera follows a defined lifecycle from creation through potential archival or deletion. Understanding this flow is key to tuning retention, decay, and storage costs.

Store importance set Active recalled, boosted Decaying half-life applies Archived L3 cold tier Forgotten deleted recall promotes explicit forget (API or TTL expiry) Memory lifecycle: store → active → decaying → archived → forgotten. Recall promotes back to active.
StageLocationWhat happens
StoreL1 + L2Memory created with initial importance. Embedded, indexed, entities extracted.
ActiveL1Frequently recalled. Each access boosts importance and resets decay timer.
DecayingL2No recent access. Importance decreases per decay strategy (exponential, linear, or step).
ArchivedL3 (S3)Below warm threshold. Moved to cold storage. Still retrievable but with higher latency.
ForgottenImportance below minimum threshold, TTL expired, or explicitly deleted via API.

4-tier caching

Dakera implements a four-level cache hierarchy that balances latency and durability. Each tier handles a different access pattern:

TierBackendLatencyPurpose
L1In-memory LRU<1msHot cache for frequently accessed memories. Bounded by configurable max entries.
L1.5Redis~1msDistributed cache in multi-node deployments. Shared across the cluster.
L2RocksDB~5msPersistent disk storage with compression. Primary durable store.
L3S3 / MinIO~50msCold/archival tier. Stores decayed or infrequently accessed memories.

Reads check L1 → L1.5 → L2 → L3, promoting on hit. Writes go to L1 + L2 synchronously, with async replication to L1.5 and L3.

L1 In-memory LRU <1ms L1.5 Redis ~1ms L2 RocksDB ~5ms L3 S3/MinIO ~50ms HOT ← → COLD Reads cascade L1→L3, promoting on hit. Writes: L1+L2 sync, L1.5+L3 async.

Vector indexes

Dakera supports four vector index types, selectable per namespace:

IndexStrengthsBest for
HNSWSub-10ms at millions of vectors, tunable recall/speedGeneral-purpose (default)
IVFLower memory, good for high-dimensional dataLarge datasets with limited RAM
SPFreshWrite-optimized, maintains recall under heavy insertsHigh-throughput streaming workloads
FlatExact nearest-neighbor, no approximationSmall namespaces (<10K vectors), ground-truth evaluation

All indexes use SIMD-accelerated distance functions (cosine, L2, inner product). Index configuration is set per namespace via the REST API or CLI.

Knowledge graph

Dakera maintains a persistent entity graph alongside vector storage. Entities are extracted automatically (or via API) and linked to memories. The graph supports four edge types:

Edge typeMeaning
RelatedToSemantic similarity between memories
SharesEntityTwo memories mention the same named entity
PrecedesTemporal ordering — memory A happened before B
LinkedByExplicit user-created link via API

Use the knowledge graph for multi-hop reasoning, entity-centric retrieval, and cross-agent network visualization. Query via dk knowledge CLI or the KG API endpoints.

User Project Tool Pref Skill RelatedTo SharesEntity Precedes LinkedBy SharesEntity Entity nodes linked by 4 edge types — enables multi-hop reasoning

Event bus & SSE

Dakera includes a built-in event bus that streams real-time notifications over Server-Sent Events (SSE). Subscribe to memory lifecycle events — store, recall, forget, decay — filtered by namespace and agent. Useful for building dashboards, audit UIs, and agent coordination pipelines.

# Subscribe to events for a specific agent
curl -N -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  "http://localhost:3300/v1/events/stream?agent_id=my-agent"

AutoPilot

AutoPilot runs as a background task that automatically manages memory lifecycle:

Configure via DAKERA_AUTOPILOT_DEDUP_INTERVAL_HOURS (default: 1h) and DAKERA_AUTOPILOT_DEDUP_THRESHOLD (default: 0.93). Trigger manually via the /admin/autopilot/trigger endpoint.

Retrieval pipeline

Every recall request flows through an 8-step pipeline:

POST /v1/memories/recall
1
Query classification
ML router categorizes: factual, multi-hop, temporal, comparison
2
Embedding
ONNX model embeds query on-device
3
Vector search
ANN retrieval via HNSW/IVF/SPFresh index
4
BM25 search
Full-text keyword match with per-namespace index
5
Reciprocal Rank Fusion
Merge vector + BM25 results into a unified ranking
6
Temporal scoring
Apply decay weights and recency boost
7
Cross-encoder reranking
bge-reranker-base scores top candidates for precision
8
Filter & return
Apply metadata filters, enforce top_k, return results
← MemoryRecallResponse { memories, total, scores }

gRPC API

Dakera exposes a gRPC API on port 50051 alongside the REST API. Both APIs access the same underlying engine — choose based on your use case:

FeatureREST APIgRPC API
Best forWeb clients, quick integration, debuggingHigh-throughput services, microservices, streaming
Latency~2-5ms overhead (HTTP/JSON)~0.5-1ms overhead (HTTP/2, protobuf)
StreamingSSE (server-sent events)Bidirectional streaming
Type safetyOpenAPI spec availableStrongly typed via proto definitions
Browser supportNativeRequires grpc-web proxy
# Enable gRPC (enabled by default)
DAKERA_GRPC_ENABLED=true
DAKERA_GRPC_PORT=50051

# For mTLS on the gRPC port, configure at the reverse proxy layer

Backup & WAL

Dakera uses RocksDB's write-ahead log (WAL) for crash recovery. Every write is durably logged before acknowledgment, ensuring no data loss on unexpected shutdown. On restart, the WAL is replayed automatically to restore the last consistent state.

For point-in-time backups, use the admin API (see Deployment → Backup & Restore). Backups include all memories, indexes, namespaces, and configuration — everything needed for a full restore.

Import & export

Bulk data can be moved in and out of Dakera via the memory import/export endpoints. Both use streaming JSON for efficient handling of large datasets.

# Export all memories from a namespace
curl http://localhost:3300/admin/memories/export?namespace=my-ns \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  -o memories.jsonl

# Import memories from JSONL file
curl -X POST http://localhost:3300/admin/memories/import \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @memories.jsonl

Compaction

RocksDB performs automatic compaction in the background to merge sorted runs, reclaim deleted space, and maintain read performance. Dakera also exposes a manual compaction endpoint for maintenance windows:

# Trigger manual compaction on a namespace
curl -X POST http://localhost:3300/admin/namespaces/my-ns/optimize \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY"
# {"status":"completed","duration_ms":1234,"freed_bytes":52428800}
When to compact manually: After bulk deletions or large-scale AutoPilot deduplication runs. Automatic compaction handles normal workloads — manual compaction is for reclaiming space immediately.

Distributed architecture

In cluster mode, Dakera distributes data across nodes using consistent hashing. Each memory is assigned to a shard based on its namespace and ID, and shards are mapped to nodes on a virtual ring.

ComponentMechanismPurpose
MembershipSWIM gossip protocolNodes discover and monitor each other's health. Failure detection via probe → suspect → dead lifecycle.
Leader electionLease-based with fencing tokensOne leader coordinates shard assignments and rebalancing. Monotonic tokens prevent stale leaders from acting.
ShardingConsistent hashing (virtual nodes)Data distributed evenly. Adding/removing nodes only migrates ~1/N of data.
ReplicationConfigurable replication factorShard replicas on multiple nodes for durability. Eventual consistency with gossip-driven convergence.
RebalancingAutomatic on membership changeLeader detects node join/leave and redistributes shards. Zero-downtime migration.

See High Availability for cluster setup, failure modes, and operational procedures.

Encryption at rest

All memory content can be encrypted with AES-256-GCM authenticated encryption. Set DAKERA_ENCRYPTION_KEY to enable. Passphrases are derived via PBKDF2-HMAC-SHA256 with 100,000 iterations. Key rotation re-encrypts all memories atomically via /admin/encryption/rotate.

Filter expressions

Metadata filters can be applied to any recall, vector query, or batch operation.

{
  "filter": {
    "$and": [
      { "importance": { "$gt": 0.7 } },
      { "tags": { "$in": ["decision", "blocker"] } }
    ]
  }
}

Supported operators: $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin, $and, $or, $not, $exists, $regex, $contains, $icontains, $startsWith, $endsWith, $arrayContains, $arrayContainsAll, $arrayContainsAny.