Changelog

Recent Dakera releases. Full history at GitHub Releases and GHCR tags.

VersionDateHighlights
v0.11.95 LATEST Jun 2026 bge-reranker-v2-m3 cross-encoder upgrade (CE-C) — Reranker upgraded from Xenova/bge-reranker-base (278M, English-centric, ~49 nDCG@10 BEIR) to onnx-community/bge-reranker-v2-m3-ONNX INT8 (568M, multilingual, 51.8 nDCG@10). Improves ranking precision on multi-hop and temporal queries. Entity-linked memory boosting (CE-D) — off-by-default Mem0-style additive scoring boost for entity-matched candidates with popularity decay. BM25 English stemming regression guard — CI guards prevent silent revert of production English stemming. LoCoMo 87.1% overall (Cat1 86.5% / Cat2 85.0% / Cat3 69.6% / Cat4 90.0%) — all category gates pass.
v0.11.94 Jun 2026 gte-modernbert-base embedding modelDAKERA_MODEL=gte-modernbert-base switches to Alibaba-NLP/gte-modernbert-base (149M params, 768d, 8192 token window, MTEB retrieval 64.38 vs bge-large 64.23). MRL dimension truncationDAKERA_MRL_DIMENSION=256 post-pool truncates ModernBERT embeddings (64/128/256/512/768d supported); non-MRL models fall back silently. Fusion strategy overrideDAKERA_FUSION_STRATEGY=rrf enables live MinMax/RRF A/B without code changes; unset = MinMax (default, byte-identical). Deploy time cut from 12 min to <5 min via PR image retag on merge.
v0.11.93 Jun 2026 Query decomposition — opt-in DAKERA_QUERY_DECOMP=1 decomposes complex queries into sub-queries with RRF fan-out, improving multi-step recall. Security: quinn-proto 0.11.14 → 0.11.15 (RUSTSEC-2026-0185, CVSS 7.5).
v0.11.92 Jun 2026 Critical entity-extraction fixPOST /v1/extract now defaults entity_types to [person, organization, location] when not supplied, fixing playground returning 0 entities. GLiNER cache-gated to avoid cold-start 502s (graceful degradation to rule-based fallback).
v0.11.91 Jun 2026 Admin SDK parity — admin routes now served at /v1/admin/* (SDK-canonical) in addition to /admin/* (back-compat), fixing 404s from client.autopilot_status(), client.decay_config() etc. CE-36: compound recall score weights (DAKERA_SCORE_W_VEC, DAKERA_SCORE_W_IMP, DAKERA_SCORE_W_REC, DAKERA_SCORE_RECENCY_TAU_HOURS) are now runtime-configurable; defaults byte-identical to prior releases.
v0.11.90 Jun 2026 CE-32: list-aware sentence decomposition and supersession demotion both enabled by default (DAK-6507). Set DAKERA_BATCH_SENTENCE_DECOMP=0 or DAKERA_SUPERSEDE_DEMOTE=0 to opt out. LoCoMo 1540Q PASS — Overall 87.0%, Cat3 68.5%.
v0.11.89 Jun 2026 List-aware CE-31 sentence decomposition on batch ingest (DAK-6497) — buried-list gold recovered; LoCoMo byte-identical (session-gated). Supersession demotion hardened: session-coherent messages no longer falsely demoted. DFB-recall wired as type:ce PR gate, replaces LME smoke.
v0.11.88 Jun 2026 Opt-in CE-31 sentence decomposition on batch ingest path (DAK-6477, inert by default). Enables DAKERA_CE31_DECOMPOSE=1 to activate.
v0.11.87 Jun 2026 Fix: honor DAKERA_CROSS_SESSION_FETCH_MULT in session-scoped recall branch (DAK-6477) — env override was silently ignored for session queries; now respected without changing LoCoMo defaults.
v0.11.86 Jun 2026 LME CE overhaul (DAK-6440) — 5 architectural recall fixes across Cat1–Cat4. Auto-dispatch LME smoke on PR Docker images. Dependency bumps: opendal 0.57.0, chrono 0.4.45, rusqlite 0.40.1, prost 0.14.4.
v0.11.85 Jun 2026 Env-gated LME fetch depth knobs (DAK-6423, inert by default): DAKERA_HYBRID_FETCH_MULT (Hybrid+rerank coarse multiplier, default 5×) and DAKERA_CROSS_SESSION_FETCH_MULT (cross-session query pool depth). LoCoMo recall path byte-identical when unset.
v0.11.84 Jun 2026 Entity vector search for temporal queries — Cat3 recall +4.4 pp (65.2% → 69.6%) via entity-filtered HNSW supplemental pass on BM25-routed temporal queries. Cross-encoder reranker queue fix — eliminates silent recall drop under concurrent load. Re-embed importance floor 0.0 — all static-tier memories now upgrade to ONNX embeddings.
v0.11.83 Jun 2026 Docker image variants — :cpu (INT8 quantised, CPU-only) and :gpu/:cuda (FP32 with CUDA base image). Deterministic HNSW build order — DAKERA_HNSW_SEED now effective across restarts. Raw-fs fast write path for ObjectStorage::upsert (~9× storage throughput). O(namespace) scan removed from per-batch list — eliminates per-store N² overhead.
v0.11.82 Jun 2026 Model2Vec static-write tier enabled in production (DAKERA_TIERED=1) — 9.7× ingest throughput. New POST /admin/reembed/drain endpoint for synchronous quality drain without waiting for background cycles. GLiNER entity extraction restored (FP32 model + correct span format). SHA-256 integrity guard for ONNX cold-boot with re-fetch-once on mismatch. Redundant legacy ONNX pool freed in tiered mode.
v0.11.81 Jun 2026 OOM hardening — OnnxBackend pool capped to 1 in GPU mode (eliminates BFCArena fragmentation from concurrent CUDA sessions), CPU memory pattern disabled (.with_memory_pattern(false)), OOM retry depth extended to batch=1. GPU forward passes serialised at allocator level via parking_lot::Mutex, replacing coarser GPU_INFERENCE_SEMAPHORE.
v0.11.80 May 2026 SIMD-accelerated HNSW distance (3–8× throughput on x86_64 + aarch64). GPU semaphore fix — OnnxBackend now serialises CUDA forward passes, eliminating CUBLAS_STATUS_ALLOC_FAILED under parallel ingest. CE-TORA8 confidence gate: Cat3 temporal recall 73.9% (+5.0 pp).
v0.11.79 May 2026 Global GPU_INFERENCE_SEMAPHORE (1 permit) serialises all CUDA forward passes across ONNX + Candle backends — prevents BFCArena VRAM fragmentation under concurrent ingest. Semaphore now acquired before TieredEngine branch in embed_text().
v0.11.78 May 2026 TieredEngine pre-warms at server startup — eliminates 7–10 min first-request stall. ReembedJob batch ANN invalidation: one HNSW rebuild per cycle instead of O(N) (18× recall speedup). store_memory_batch() now routes through TieredEngine write path — GPU-free static ingest for batch operations.
v0.11.77 May 2026 SearchMode::Hybrid is now the server default. TieredEngine end-to-end fully wired — static→transformer upgrade pipeline. docker-compose.yml ships DAKERA_TIERED=1 + DAKERA_SEARCH_MODE=hybrid out of the box.
v0.11.76 May 2026 Binary HNSW overselect formula fixed — Recall@10 restored from 54% → ~100% for DAKERA_SEARCH_MODE=hybrid. SearchMode fallback corrected (unknown values → Float).
v0.11.75 May 2026 Pluggable inference backends (EmbeddingBackend trait): ONNX, Candle, GGUF, Static. StaticBackend Model2Vec ~500× ingest. TieredEngine. Binary HNSW. ModernBertEmbedBase 768d. Batch write method in all 4 SDKs (py/js/rs/go).
v0.11.74 May 2026 NER redesign: lowercase entity tags, entity_types honoured by all extractors, entity dedup, byte-offset span detection, MAX_TEXT_WORDS guard for long inputs.
v0.11.73 May 2026 Dual-layer timestamping — event date extracted at store time into _dakera_content_date metadata for temporal proximity scoring. SharesEntity knowledge-graph edge expansion for entity-linked cross-encoder recall.
v0.11.72 May 2026 temporal_rerank_multiplier raised 8×→12×. GLiNER span detection fix for contractions and multi-token entity names.
v0.11.71 May 2026 FP32 model for GPU/CUDA inference (fixes accuracy on CUDA EP). O(1) session lookup replaces O(N) scan. ONNX default batch size 8→32 for CPU deployments.
v0.11.69 May 2026 Parallel HNSW+BM25 in Hybrid mode re-introduced with full 1540Q bench gate — Cat4 tie-breaking regression resolved. 87.1% LoCoMo, Cat3 73.9%, Cat4 88.8%.
v0.11.68 May 2026 Revert parallel HNSW+BM25 (Cat4 RNG tie-breaking regression under tokio task interleaving). GPU CUDA Execution Provider enabled.
v0.11.67 May 2026 Cross-encoder overload gate (RERANKER_MAX_CONCURRENT=6) prevents queue-saturation cascades — graceful degradation to unranked results when gate fires, no client-side timeout storm.
v0.11.66 May 2026 Batch ONNX cross-encoder — mini-batch size RERANKER_ONNX_BATCH_SIZE=16 within each chunk; each mini-batch padded to its own max seq_len, reducing memory waste and improving throughput under concurrent rerank load.
v0.11.65 May 2026 Cross-encoder session pool (N=4) + adaptive chunk splitting — eliminates contention under concurrent recall. Each cross-encoder call gets a pooled session; chunks re-split for long documents exceeding model max seq_len.
v0.11.64 May 2026 Async metric recording off hot-path — pipeline stage metrics now recorded via tokio::spawn, removing synchronous atomic + alloc overhead from every recall. Fixes -0.7pp regression introduced in v0.11.63.
v0.11.63 May 2026 Full 8-stage recall pipeline instrumentation — Prometheus histograms for each stage (query parse → BM25 → vector → rerank → cross-encoder → dedupe → hydrate → response). Adds /metrics observability with no throughput cost.
v0.11.62 May 2026 GLiNER NER fully restored — text_lengths tensor reshaped rank-1→rank-2 so entity extraction works correctly after v0.11.61 deploy.
v0.11.61 May 2026 GLiNER model path fix — HuggingFace repos renamed (hyphen→underscore); updated to onnx-community/gliner_medium-v2.1. No HF token required. Restores dakera_auto_tag entity extraction.
v0.11.60 May 2026 HF_TOKEN support for HuggingFace model downloads — injects auth header on all download requests. Graceful fallback: no token → unauthenticated (existing behaviour for public models).
v0.11.59 May 2026 ONNX batch size 32→1 — eliminates LME 408 timeout caused by one 512-token text padding 31 peers to max seq_len. Each text now gets its own ONNX call. Session pool provides equivalent throughput.
v0.11.58 May 2026 ONNX session pool N=4 + parallel batch storage upsert — eliminates LME throughput bottleneck. Concurrent callers round-robin across pool slots. buffer_unordered(16) for parallel vector writes.
v0.11.57 May 2026 Docker base image rust:1.92→1.95 — [email protected] requires rustc ≥ 1.95. Fixes release build failures.
v0.11.56 May 2026 SDK engine parity: admin (cluster, maintenance, quotas, slow queries, backups), ops (diagnostics, jobs, compaction, shutdown), health probes, vector bulk ops (update, delete, count), fulltext stats/delete, TTL stats, storage tiers, memory type stats, agent consolidation, namespace entity config/extractor/dimension migration. All 4 SDKs (py/js/rs/go) at full parity.
dakera-mcp v0.10.9 LATEST MCP Jun 2026 v0.10.9 — T-I-F reliability scoring tool (dakera_tif_evaluate), ORCID founder attribution. Maintenance releases v0.10.4–v0.10.8 — stability improvements, updated tool descriptions. See GitHub Releases for full MCP changelog.
dakera-mcp v0.10.2 May 2026 Page size increased from 20 → 100: all profiles now return results in a single page. Reduces round-trips for agents with large memory stores.
dakera-mcp v0.10.0 May 2026 Profile-based tool tiering: 14 core tools by default, power/admin/all profiles for expanded access. Meta-discovery tools (discover_tools, load_tools). ~30K token savings vs loading all tools.
v0.11.55 May 2026 CE-118 temporal hybrid retrieval. Cat2 gate achieved (86.3%). 88.2% LoCoMo overall.
v0.11.54 May 2026 MCP: 14 core tools (86+ available via profiles). DAKERA_ENTITY_VECTOR_SEARCH enabled by default. Cross-encoder pipeline.
v0.11.45 Apr 2026 CE-71: ML routing classifier on by default. TEMPORAL_INFERENCE enabled. 87.1% LoCoMo.
v0.11.27 Apr 2026 HA cluster gossip stability improvements. MinIO throttle tuning.
v0.10.2 Apr 2026 Full stack release: server + py + js + rs + go SDKs simultaneously.
SDK versions — SDKs track the server minor version. Current: Python / JS / Go / Rust at v0.11.95, server at v0.11.95. SDKs and server are now version-locked. See the Introduction → for the full packages table.

Release cadence

Dakera releases frequently — typically multiple versions per week during active development. All releases follow semantic versioning. Patch versions are always safe to upgrade to in-place.

Release artifacts

GitHub Releases ↗ GHCR ↗ PyPI ↗ Helm (ArtifactHub) ↗
Stay updated
Get Dakera updates — releases, guides, and benchmarks. No spam.
✓ Subscribed. Thanks!