High Availability

Dakera supports production-grade multi-node clustering with gossip-based membership (SWIM protocol), lease-based leader election, consistent-hash sharding, configurable replication, and automatic rebalancing. Nodes discover each other via seed addresses and coordinate through fencing-token-protected leases. Cold-tier data is stored on shared S3-compatible object storage.

gossip gossip gossip Node 1 LEADER Node 2 Shard A, C Node 3 Shard B, D Shard A, B 3-node cluster: SWIM gossip, leader election, consistent-hash sharding

Cluster environment variables

VariableDefaultDescription
DAKERA_CLUSTER_MODEfalseEnable multi-node cluster
DAKERA_NODE_IDUnique node identifier (e.g. node-1)
DAKERA_CLUSTER_SEEDSComma-separated gossip bootstrap addresses: dakera-2:7946,dakera-3:7946
DAKERA_GOSSIP_PORT7946SWIM gossip port — must be open between nodes
DAKERA_GOSSIP_BIND0.0.0.0:7946Gossip bind address (useful for multi-NIC hosts)
DAKERA_API_ADVERTISEautoAdvertised API address for client routing
DAKERA_REDIS_URLRedis URL for distributed L1.5 cache (redis://redis:6379)

Failure modes & recovery

Understanding how the cluster handles failures is critical for operations. Here's what happens in each scenario:

FailureImpactRecoveryDowntime
Leader failureShard reassignments paused, writes still accepted by owning nodesAutomatic re-election via lease expiry~5 seconds
Follower failureShards on that node unavailable until recoveryOther nodes continue serving. Data re-replicated from replicasZero for unaffected shards
Network partitionSplit-brain riskFencing tokens prevent stale leaders from writing. Quorum required for electionPartition duration
Full cluster restartAll nodes downBootstrap from S3/MinIO cold storage. WAL replay restores last stateStartup time (~10-30s)

Split-brain prevention

Dakera prevents split-brain using fencing tokens — monotonically increasing integers assigned to each leader lease. When a leader's lease expires and a new leader is elected, the new leader receives a higher fencing token. Any operations from a stale leader with a lower token are rejected by other nodes.

Why odd node counts matter: With 3 nodes, a majority quorum requires 2 nodes — a single node failure is tolerated. With 5 nodes, 3 nodes are needed for quorum — two failures tolerated. Even node counts (2, 4) provide no better fault tolerance than N-1 and waste resources.

Replication

Data is replicated across nodes based on the replication factor. Each shard's data exists on multiple nodes for durability. The consistency model is eventual consistency with gossip-driven convergence — writes are acknowledged after the primary shard confirms, and replicas converge within milliseconds under normal network conditions.

Replication factorNodes requiredFault tolerance
1 (no replication)1+None — node loss means data loss for that shard
23+1 node failure per shard
3 (recommended)5+2 node failures per shard

Scaling procedures

Adding a node

# 1. Start a new node with seed addresses pointing to existing cluster
docker run -d \
  --name dakera-4 \
  -e DAKERA_CLUSTER_MODE=true \
  -e DAKERA_NODE_ID=node-4 \
  -e DAKERA_CLUSTER_SEEDS=dakera-1:7946,dakera-2:7946 \
  -e DAKERA_ROOT_API_KEY=$DAKERA_ROOT_API_KEY \
  -p 3303:3300 -p 7949:7946 \
  ghcr.io/dakera-ai/dakera:latest

# 2. Verify node joined the cluster
curl -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  http://localhost:3300/admin/cluster/nodes
# Node 4 appears with state: "Alive"

# 3. Automatic shard rebalancing begins — monitor progress
curl -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  http://localhost:3300/admin/cluster/status

Removing a node

# 1. Enable maintenance mode (drains traffic and migrates shards)
curl -X POST http://dakera-4:3300/admin/maintenance/enable \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY"

# 2. Wait for drain to complete
curl http://dakera-4:3300/admin/maintenance/status \
  -H "Authorization: Bearer $DAKERA_ROOT_API_KEY"
# {"status":"drained","shards_remaining":0}

# 3. Stop the node
docker stop dakera-4 && docker rm dakera-4

# 4. Remove from seed list in remaining nodes (optional — gossip handles it)

Rolling upgrade procedure

Upgrade cluster nodes one at a time with zero downtime:

StepCommandWhat happens
1. DrainPOST /admin/maintenance/enableTraffic redirected, shards migrated to other nodes
2. VerifyGET /admin/maintenance/statusConfirm shards_remaining: 0
3. Upgradedocker pull && docker restartPull new image, restart container
4. HealthGET /healthConfirm new version healthy
5. RejoinPOST /admin/maintenance/disableNode rejoins cluster, shards rebalanced back
6. RepeatMove to next node

Docker Compose cluster

The dakera-deploy repo includes a multi-node Docker Compose configuration. A 3-node cluster with Traefik load balancer, Redis cache, and shared MinIO storage:

# Clone and start a 3-node cluster
git clone https://github.com/dakera-ai/dakera-deploy
cd dakera-deploy/ha

# Generate secrets and start
cp .env.example .env  # edit with your keys
docker compose up -d

# Verify cluster health
curl -H "Authorization: Bearer $DAKERA_ROOT_API_KEY" \
  http://localhost:3100/admin/cluster/status

Kubernetes HA

Use the Helm chart with dakera.replicaCount set to 3 or more. The chart automatically configures gossip seeds, Redis, and shared storage:

helm install dakera oci://ghcr.io/dakera-ai/dakera-helm/dakera \
  --namespace dakera --create-namespace \
  --set dakera.replicaCount=3 \
  --set dakera.rootApiKey="$(openssl rand -hex 32)" \
  --set minio.rootPassword="$(openssl rand -hex 16)"

Monitoring HA health

Key metrics to watch in a clustered deployment:

MetricAlert thresholdMeaning
dakera_gossip_members< expected node countA node has left the cluster — investigate immediately
dakera_replication_lag_ms> 1000msReplicas falling behind — check network or disk I/O
dakera_shard_balance> 0.3 skewUneven shard distribution — trigger manual rebalance
dakera_leader_lease_remaining_ms< 5000msLeader lease about to expire — potential re-election
dakera_forwarded_requestshigh ratioMany requests hitting wrong node — check load balancer config

Performance characteristics

OperationSingle nodeClustered (3 nodes)Notes
Local read<5ms<5msRead hits owning node directly
Forwarded read~8-12msRequest routed to correct shard owner
Write<10ms~15-20msPrimary ack + async replication overhead
Recall (hybrid)~30ms~35-45msMinimal overhead — search is shard-local

Gossip tuning

For most deployments, the default gossip settings work well. Tune these for high-latency or large clusters:

VariableDefaultWhen to adjust
DAKERA_GOSSIP_PORT7946Port conflict with another service
DAKERA_GOSSIP_BIND0.0.0.0:7946Multi-NIC hosts — bind to specific interface
Network requirements: Gossip uses both TCP and UDP on the configured port. Ensure both protocols are allowed between cluster nodes. Latency between nodes should be <50ms for optimal failure detection.

Disaster recovery

For cross-region disaster recovery:

Admin endpoints

EndpointDescription
GET /admin/cluster/statusCluster health, leader node, shard distribution
GET /admin/cluster/nodesAll nodes with state (Alive/Suspect/Dead), roles, shard assignments
POST /admin/cluster/rebalanceTrigger manual shard rebalancing
POST /admin/maintenance/enableDrain node for rolling upgrades
POST /admin/maintenance/disableRejoin node after maintenance
GET /admin/maintenance/statusCheck drain progress

Production checklist

Related

Deployment → Troubleshooting → dakera-deploy repo ↗