High Availability
Dakera supports production-grade multi-node clustering with gossip-based membership (SWIM protocol), lease-based leader election, consistent-hash sharding, configurable replication, and automatic rebalancing. Nodes discover each other via seed addresses and coordinate through fencing-token-protected leases. Cold-tier data is stored on shared S3-compatible object storage.
Cluster environment variables
| Variable | Default | Description |
|---|---|---|
DAKERA_CLUSTER_MODE | false | Enable multi-node cluster |
DAKERA_NODE_ID | — | Unique node identifier (e.g. node-1) |
DAKERA_CLUSTER_SEEDS | — | Comma-separated gossip bootstrap addresses: dakera-2:7946,dakera-3:7946 |
DAKERA_GOSSIP_PORT | 7946 | SWIM gossip port — must be open between nodes |
DAKERA_GOSSIP_BIND | 0.0.0.0:7946 | Gossip bind address (useful for multi-NIC hosts) |
DAKERA_API_ADVERTISE | auto | Advertised API address for client routing |
DAKERA_REDIS_URL | — | Redis URL for distributed L1.5 cache (redis://redis:6379) |
Failure modes & recovery
Understanding how the cluster handles failures is critical for operations. Here's what happens in each scenario:
| Failure | Impact | Recovery | Downtime |
|---|---|---|---|
| Leader failure | Shard reassignments paused, writes still accepted by owning nodes | Automatic re-election via lease expiry | ~5 seconds |
| Follower failure | Shards on that node unavailable until recovery | Other nodes continue serving. Data re-replicated from replicas | Zero for unaffected shards |
| Network partition | Split-brain risk | Fencing tokens prevent stale leaders from writing. Quorum required for election | Partition duration |
| Full cluster restart | All nodes down | Bootstrap from S3/MinIO cold storage. WAL replay restores last state | Startup time (~10-30s) |
Split-brain prevention
Dakera prevents split-brain using fencing tokens — monotonically increasing integers assigned to each leader lease. When a leader's lease expires and a new leader is elected, the new leader receives a higher fencing token. Any operations from a stale leader with a lower token are rejected by other nodes.
Replication
Data is replicated across nodes based on the replication factor. Each shard's data exists on multiple nodes for durability. The consistency model is eventual consistency with gossip-driven convergence — writes are acknowledged after the primary shard confirms, and replicas converge within milliseconds under normal network conditions.
| Replication factor | Nodes required | Fault tolerance |
|---|---|---|
| 1 (no replication) | 1+ | None — node loss means data loss for that shard |
| 2 | 3+ | 1 node failure per shard |
| 3 (recommended) | 5+ | 2 node failures per shard |
Scaling procedures
Adding a node
# 1. Start a new node with seed addresses pointing to existing cluster
docker run -d \
--name dakera-4 \
-e DAKERA_CLUSTER_MODE=true \
-e DAKERA_NODE_ID=node-4 \
-e DAKERA_CLUSTER_SEEDS=dakera-1:7946,dakera-2:7946 \
-e DAKERA_ROOT_API_KEY=$DAKERA_ROOT_API_KEY \
-p 3303:3300 -p 7949:7946 \
ghcr.io/dakera-ai/dakera:latest
# 2. Verify node joined the cluster
curl -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
http://localhost:3300/admin/cluster/nodes
# Node 4 appears with state: "Alive"
# 3. Automatic shard rebalancing begins — monitor progress
curl -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
http://localhost:3300/admin/cluster/status
Removing a node
# 1. Enable maintenance mode (drains traffic and migrates shards)
curl -X POST http://dakera-4:3300/admin/maintenance/enable \
-H "Authorization: Bearer $DAKERA_ROOT_API_KEY"
# 2. Wait for drain to complete
curl http://dakera-4:3300/admin/maintenance/status \
-H "Authorization: Bearer $DAKERA_ROOT_API_KEY"
# {"status":"drained","shards_remaining":0}
# 3. Stop the node
docker stop dakera-4 && docker rm dakera-4
# 4. Remove from seed list in remaining nodes (optional — gossip handles it)
Rolling upgrade procedure
Upgrade cluster nodes one at a time with zero downtime:
| Step | Command | What happens |
|---|---|---|
| 1. Drain | POST /admin/maintenance/enable | Traffic redirected, shards migrated to other nodes |
| 2. Verify | GET /admin/maintenance/status | Confirm shards_remaining: 0 |
| 3. Upgrade | docker pull && docker restart | Pull new image, restart container |
| 4. Health | GET /health | Confirm new version healthy |
| 5. Rejoin | POST /admin/maintenance/disable | Node rejoins cluster, shards rebalanced back |
| 6. Repeat | — | Move to next node |
Docker Compose cluster
The dakera-deploy repo includes a multi-node Docker Compose configuration. A 3-node cluster with Traefik load balancer, Redis cache, and shared MinIO storage:
# Clone and start a 3-node cluster
git clone https://github.com/dakera-ai/dakera-deploy
cd dakera-deploy/ha
# Generate secrets and start
cp .env.example .env # edit with your keys
docker compose up -d
# Verify cluster health
curl -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
http://localhost:3100/admin/cluster/status
Kubernetes HA
Use the Helm chart with dakera.replicaCount set to 3 or more. The chart automatically configures gossip seeds, Redis, and shared storage:
helm install dakera oci://ghcr.io/dakera-ai/dakera-helm/dakera \
--namespace dakera --create-namespace \
--set dakera.replicaCount=3 \
--set dakera.rootApiKey="$(openssl rand -hex 32)" \
--set minio.rootPassword="$(openssl rand -hex 16)"
Monitoring HA health
Key metrics to watch in a clustered deployment:
| Metric | Alert threshold | Meaning |
|---|---|---|
dakera_gossip_members | < expected node count | A node has left the cluster — investigate immediately |
dakera_replication_lag_ms | > 1000ms | Replicas falling behind — check network or disk I/O |
dakera_shard_balance | > 0.3 skew | Uneven shard distribution — trigger manual rebalance |
dakera_leader_lease_remaining_ms | < 5000ms | Leader lease about to expire — potential re-election |
dakera_forwarded_requests | high ratio | Many requests hitting wrong node — check load balancer config |
Performance characteristics
| Operation | Single node | Clustered (3 nodes) | Notes |
|---|---|---|---|
| Local read | <5ms | <5ms | Read hits owning node directly |
| Forwarded read | — | ~8-12ms | Request routed to correct shard owner |
| Write | <10ms | ~15-20ms | Primary ack + async replication overhead |
| Recall (hybrid) | ~30ms | ~35-45ms | Minimal overhead — search is shard-local |
Gossip tuning
For most deployments, the default gossip settings work well. Tune these for high-latency or large clusters:
| Variable | Default | When to adjust |
|---|---|---|
DAKERA_GOSSIP_PORT | 7946 | Port conflict with another service |
DAKERA_GOSSIP_BIND | 0.0.0.0:7946 | Multi-NIC hosts — bind to specific interface |
Disaster recovery
For cross-region disaster recovery:
- Backup to remote S3 — schedule daily backups to an S3 bucket in a different region via the admin API (see Deployment → Backup & Restore).
- Full cluster restore — provision new nodes, start with
DAKERA_STORAGE=s3pointing to the backup bucket, and upload the backup bundle via/admin/backups/restore. - RTO expectation — cluster bootstrap + restore: 5-15 minutes depending on data size.
- RPO expectation — equal to backup frequency (daily = up to 24h data loss). For lower RPO, increase backup frequency or use cross-region S3 replication on the primary bucket.
Admin endpoints
| Endpoint | Description |
|---|---|
GET /admin/cluster/status | Cluster health, leader node, shard distribution |
GET /admin/cluster/nodes | All nodes with state (Alive/Suspect/Dead), roles, shard assignments |
POST /admin/cluster/rebalance | Trigger manual shard rebalancing |
POST /admin/maintenance/enable | Drain node for rolling upgrades |
POST /admin/maintenance/disable | Rejoin node after maintenance |
GET /admin/maintenance/status | Check drain progress |
Production checklist
- Odd replica counts — run 3 or 5 nodes for clean quorum math
- Dedicated S3/MinIO — do not co-locate cold storage on Dakera nodes
- External load balancer — place outside the Dakera node pool
- Monitor all nodes — scrape
/metricson every node; alert on gossip membership count and replication lag - Firewall gossip port 7946 — restrict to cluster nodes only (TCP + UDP)
- Rolling upgrades — use maintenance mode to drain nodes one at a time
- Cross-region backups — replicate to a separate S3 bucket for disaster recovery
- Test failover — periodically stop a node and verify automatic recovery