Why Self-Hosted AI Memory Matters
Every AI agent that operates on user data faces a fundamental tension: memory improves agent quality, but memory also means storing sensitive information. When that memory lives on a third-party cloud service, you inherit their security posture, their data residency constraints, and their pricing model.
Self-hosted AI memory eliminates these concerns. You control the hardware, the encryption keys, the retention policies, and the network boundaries. For organizations building agents that handle medical records, financial data, legal documents, or proprietary business logic, self-hosting isn't a preference — it's a requirement.
This guide walks through everything you need to deploy a production-grade self-hosted memory system: from hardware requirements to encryption configuration, from hybrid retrieval tuning to operational monitoring.
Architecture of a Self-Hosted Memory Server
A self-hosted AI memory server needs to handle several concerns simultaneously:
- Storage — persistent, encrypted storage for memories with metadata
- Retrieval — fast similarity search (vector) combined with keyword matching (BM25)
- Embeddings — converting text to vectors without sending data to external APIs
- Temporal reasoning — understanding when memories were created and how they decay
- Access control — namespaces, sessions, and company isolation
Dakera ships all of this as a single Rust binary. No Docker compose files with 6 services, no managed vector database subscriptions, no external embedding API calls. You download the binary, set an encryption key, and start it.
On-Device Embeddings
The most common privacy leak in "self-hosted" memory systems is the embedding step. If you're sending text to OpenAI's embedding API to generate vectors, you've already sent your data off-premise. Dakera runs ONNX-based embedding models locally — MiniLM, BGE, and E5 variants are supported out of the box. The embedding computation happens on your CPU or GPU, and no text ever leaves the machine.
# Configure the embedding model in dakera.toml
[embeddings]
model = "bge-small-en-v1.5" # or "minilm-l6-v2", "e5-small-v2"
device = "cpu" # or "cuda" for GPU acceleration
batch_size = 64
Deployment: From Zero to Production
Hardware Requirements
| Scale | Memories | RAM | CPU | Disk |
|---|---|---|---|---|
| Development | < 10K | 512 MB | 1 core | 1 GB |
| Small team | 10K - 100K | 2 GB | 2 cores | 10 GB |
| Production | 100K - 1M | 8 GB | 4 cores | 50 GB |
| Enterprise | 1M+ | 32 GB | 8+ cores | 200 GB+ |
Installation
Download the binary for your platform and make it executable:
# Linux x86_64
curl -fsSL https://get.dakera.ai/install.sh | sh
# Or download directly
wget https://releases.dakera.ai/latest/dakera-linux-amd64
chmod +x dakera-linux-amd64
mv dakera-linux-amd64 /usr/local/bin/dakera
Initial Configuration
Create a configuration file that sets up encryption, storage, and the API server:
# /etc/dakera/dakera.toml
[server]
host = "0.0.0.0"
port = 3300
[storage]
path = "/var/lib/dakera/data"
encryption = "aes-256-gcm"
[retrieval]
mode = "hybrid" # HNSW + BM25
hnsw_ef_construction = 200
hnsw_m = 16
bm25_k1 = 1.2
bm25_b = 0.75
[embeddings]
model = "bge-small-en-v1.5"
device = "cpu"
Starting the Server
# Set the encryption key via environment variable
export DAKERA_ENCRYPTION_KEY="your-256-bit-key-here"
# Start the server
dakera serve --config /etc/dakera/dakera.toml
# Or run as a systemd service
sudo systemctl enable dakera
sudo systemctl start dakera
Encryption at Rest: AES-256-GCM
Every memory stored by Dakera is encrypted before it touches disk. The encryption uses AES-256-GCM, which provides both confidentiality and authentication — meaning tampered data is detected and rejected on read.
The encryption key never leaves your infrastructure. Dakera supports loading it from environment variables, file paths, or external secret managers via a plugin interface. If the server process is terminated, the data on disk is indistinguishable from random bytes without the key.
Unlike cloud memory services where the provider holds encryption keys, self-hosted deployment means you are the only entity that can decrypt your agent's memories.
Hybrid Retrieval: HNSW + BM25
A common mistake in memory systems is relying solely on vector similarity. Vectors excel at semantic matching but fail on exact terms — product names, error codes, specific dates. BM25 excels at exact matching but misses paraphrases and semantic relationships.
Dakera's hybrid retrieval combines both approaches. When a query arrives, it runs in parallel against the HNSW vector index and the BM25 inverted index, then merges results using reciprocal rank fusion:
# Python SDK example: hybrid retrieval
from dakera import Dakera
client = Dakera(base_url="http://localhost:3300")
# Store a memory
client.memory.add(
namespace="project-alpha",
content="The deployment to us-east-1 failed at 14:32 UTC due to OOM on node k8s-worker-3",
metadata={"source": "incident-report", "severity": "high"}
)
# Retrieve with hybrid search (default mode)
results = client.memory.search(
namespace="project-alpha",
query="what caused the deployment failure?",
limit=5
)
# Results combine semantic understanding ("deployment failure")
# with exact term matching ("us-east-1", "OOM", "k8s-worker-3")
Namespace Isolation
In a self-hosted deployment serving multiple agents or teams, namespace isolation ensures that one agent's memories never leak into another's retrieval results. Each namespace maintains its own HNSW index, BM25 index, and knowledge graph.
# Each agent gets its own namespace
client.memory.add(namespace="agent-support", content="...")
client.memory.add(namespace="agent-sales", content="...")
# Searches are scoped — no cross-contamination
results = client.memory.search(namespace="agent-support", query="...")
# Only returns memories from agent-support
For multi-tenant deployments, Dakera also supports company-level isolation, where each company gets a completely separate data directory with its own encryption key.
Network Security
Since Dakera runs on your infrastructure, you control the network boundaries:
- Bind to localhost if your agents run on the same machine
- Use a private network (VPC, VLAN) for distributed deployments
- Place behind a reverse proxy (nginx, Caddy) for TLS termination
- Firewall rules — only allow connections from known agent IPs
# Bind only to localhost for same-machine agents
[server]
host = "127.0.0.1"
port = 3300
# Or bind to a private interface
[server]
host = "10.0.1.5"
port = 3300
Operational Monitoring
Self-hosted means you're responsible for uptime. Dakera exposes a health endpoint and metrics for integration with your existing monitoring stack:
# Health check
curl http://localhost:3300/health
# {"status": "healthy", "version": "0.11.54", "uptime_seconds": 86400}
# Metrics (Prometheus-compatible)
curl http://localhost:3300/metrics
# dakera_memories_total{namespace="agent-support"} 45231
# dakera_searches_total{namespace="agent-support"} 12847
# dakera_search_latency_p99_ms 23
Backup and Disaster Recovery
Since all data lives in a single directory, backups are straightforward:
# Stop writes temporarily for a consistent snapshot
curl -X POST http://localhost:3300/admin/freeze
# Backup the data directory
tar -czf dakera-backup-$(date +%Y%m%d).tar.gz /var/lib/dakera/data
# Resume writes
curl -X POST http://localhost:3300/admin/unfreeze
The encrypted backup can be stored on any medium — cloud object storage, tape, another server — without compromising privacy, since it's AES-256-GCM encrypted and useless without the key.
Comparison: Self-Hosted vs. Cloud Memory
| Concern | Cloud Memory (Mem0, Zep) | Self-Hosted (Dakera) |
|---|---|---|
| Data residency | Provider's region | Your hardware, your jurisdiction |
| Encryption keys | Provider-managed | You hold the only copy |
| Embedding privacy | Text sent to external API | On-device ONNX inference |
| Network exposure | Public internet | Private network, your firewall |
| Cost at scale | Per-query pricing | Fixed infrastructure cost |
| Latency | Network round-trip | Local, sub-25ms p99 |
| Compliance (HIPAA, SOC2) | Depends on provider | Your controls, your audit |
When to Self-Host
Self-hosted AI memory is the right choice when:
- Your agents process PII, PHI, or classified information
- Regulatory requirements mandate data residency (GDPR, HIPAA, FedRAMP)
- You need predictable costs at scale (no per-query fees)
- Low latency is critical (same-network retrieval vs. internet round-trip)
- You want full control over retention policies and data lifecycle
If you're prototyping with non-sensitive data and don't need compliance guarantees, a cloud solution might be simpler to start with. But for production agents handling real user data, self-hosting gives you control that no SLA can match.
Getting Started in 5 Minutes
Here's the fastest path from zero to a working self-hosted memory server:
# 1. Install
curl -fsSL https://get.dakera.ai/install.sh | sh
# 2. Start with defaults (in-memory encryption key for dev)
dakera serve
# 3. Store a memory
curl -X POST http://localhost:3300/v1/memory \
-H "Content-Type: application/json" \
-d '{"namespace": "test", "content": "The user prefers dark mode and vim keybindings"}'
# 4. Search
curl "http://localhost:3300/v1/memory/search?namespace=test&query=what+are+the+user+preferences"
For a full production deployment guide with systemd, TLS, and monitoring, see the quickstart documentation.