Changelog
Recent Dakera releases. Full history at GitHub Releases and GHCR tags.
| Version | Date | Highlights |
|---|---|---|
v0.11.95 LATEST |
Jun 2026 | bge-reranker-v2-m3 cross-encoder upgrade (CE-C) — Reranker upgraded from Xenova/bge-reranker-base (278M, English-centric, ~49 nDCG@10 BEIR) to onnx-community/bge-reranker-v2-m3-ONNX INT8 (568M, multilingual, 51.8 nDCG@10). Improves ranking precision on multi-hop and temporal queries. Entity-linked memory boosting (CE-D) — off-by-default Mem0-style additive scoring boost for entity-matched candidates with popularity decay. BM25 English stemming regression guard — CI guards prevent silent revert of production English stemming. LoCoMo 87.1% overall (Cat1 86.5% / Cat2 85.0% / Cat3 69.6% / Cat4 90.0%) — all category gates pass. |
v0.11.94 |
Jun 2026 | gte-modernbert-base embedding model — DAKERA_MODEL=gte-modernbert-base switches to Alibaba-NLP/gte-modernbert-base (149M params, 768d, 8192 token window, MTEB retrieval 64.38 vs bge-large 64.23). MRL dimension truncation — DAKERA_MRL_DIMENSION=256 post-pool truncates ModernBERT embeddings (64/128/256/512/768d supported); non-MRL models fall back silently. Fusion strategy override — DAKERA_FUSION_STRATEGY=rrf enables live MinMax/RRF A/B without code changes; unset = MinMax (default, byte-identical). Deploy time cut from 12 min to <5 min via PR image retag on merge. |
v0.11.93 |
Jun 2026 | Query decomposition — opt-in DAKERA_QUERY_DECOMP=1 decomposes complex queries into sub-queries with RRF fan-out, improving multi-step recall. Security: quinn-proto 0.11.14 → 0.11.15 (RUSTSEC-2026-0185, CVSS 7.5). |
v0.11.92 |
Jun 2026 | Critical entity-extraction fix — POST /v1/extract now defaults entity_types to [person, organization, location] when not supplied, fixing playground returning 0 entities. GLiNER cache-gated to avoid cold-start 502s (graceful degradation to rule-based fallback). |
v0.11.91 |
Jun 2026 | Admin SDK parity — admin routes now served at /v1/admin/* (SDK-canonical) in addition to /admin/* (back-compat), fixing 404s from client.autopilot_status(), client.decay_config() etc. CE-36: compound recall score weights (DAKERA_SCORE_W_VEC, DAKERA_SCORE_W_IMP, DAKERA_SCORE_W_REC, DAKERA_SCORE_RECENCY_TAU_HOURS) are now runtime-configurable; defaults byte-identical to prior releases. |
v0.11.90 |
Jun 2026 | CE-32: list-aware sentence decomposition and supersession demotion both enabled by default (DAK-6507). Set DAKERA_BATCH_SENTENCE_DECOMP=0 or DAKERA_SUPERSEDE_DEMOTE=0 to opt out. LoCoMo 1540Q PASS — Overall 87.0%, Cat3 68.5%. |
v0.11.89 |
Jun 2026 | List-aware CE-31 sentence decomposition on batch ingest (DAK-6497) — buried-list gold recovered; LoCoMo byte-identical (session-gated). Supersession demotion hardened: session-coherent messages no longer falsely demoted. DFB-recall wired as type:ce PR gate, replaces LME smoke. |
v0.11.88 |
Jun 2026 | Opt-in CE-31 sentence decomposition on batch ingest path (DAK-6477, inert by default). Enables DAKERA_CE31_DECOMPOSE=1 to activate. |
v0.11.87 |
Jun 2026 | Fix: honor DAKERA_CROSS_SESSION_FETCH_MULT in session-scoped recall branch (DAK-6477) — env override was silently ignored for session queries; now respected without changing LoCoMo defaults. |
v0.11.86 |
Jun 2026 | LME CE overhaul (DAK-6440) — 5 architectural recall fixes across Cat1–Cat4. Auto-dispatch LME smoke on PR Docker images. Dependency bumps: opendal 0.57.0, chrono 0.4.45, rusqlite 0.40.1, prost 0.14.4. |
v0.11.85 |
Jun 2026 | Env-gated LME fetch depth knobs (DAK-6423, inert by default): DAKERA_HYBRID_FETCH_MULT (Hybrid+rerank coarse multiplier, default 5×) and DAKERA_CROSS_SESSION_FETCH_MULT (cross-session query pool depth). LoCoMo recall path byte-identical when unset. |
v0.11.84 |
Jun 2026 | Entity vector search for temporal queries — Cat3 recall +4.4 pp (65.2% → 69.6%) via entity-filtered HNSW supplemental pass on BM25-routed temporal queries. Cross-encoder reranker queue fix — eliminates silent recall drop under concurrent load. Re-embed importance floor 0.0 — all static-tier memories now upgrade to ONNX embeddings. |
v0.11.83 |
Jun 2026 | Docker image variants — :cpu (INT8 quantised, CPU-only) and :gpu/:cuda (FP32 with CUDA base image). Deterministic HNSW build order — DAKERA_HNSW_SEED now effective across restarts. Raw-fs fast write path for ObjectStorage::upsert (~9× storage throughput). O(namespace) scan removed from per-batch list — eliminates per-store N² overhead. |
v0.11.82 |
Jun 2026 | Model2Vec static-write tier enabled in production (DAKERA_TIERED=1) — 9.7× ingest throughput. New POST /admin/reembed/drain endpoint for synchronous quality drain without waiting for background cycles. GLiNER entity extraction restored (FP32 model + correct span format). SHA-256 integrity guard for ONNX cold-boot with re-fetch-once on mismatch. Redundant legacy ONNX pool freed in tiered mode. |
v0.11.81 |
Jun 2026 | OOM hardening — OnnxBackend pool capped to 1 in GPU mode (eliminates BFCArena fragmentation from concurrent CUDA sessions), CPU memory pattern disabled (.with_memory_pattern(false)), OOM retry depth extended to batch=1. GPU forward passes serialised at allocator level via parking_lot::Mutex, replacing coarser GPU_INFERENCE_SEMAPHORE. |
v0.11.80 |
May 2026 | SIMD-accelerated HNSW distance (3–8× throughput on x86_64 + aarch64). GPU semaphore fix — OnnxBackend now serialises CUDA forward passes, eliminating CUBLAS_STATUS_ALLOC_FAILED under parallel ingest. CE-TORA8 confidence gate: Cat3 temporal recall 73.9% (+5.0 pp). |
v0.11.79 |
May 2026 | Global GPU_INFERENCE_SEMAPHORE (1 permit) serialises all CUDA forward passes across ONNX + Candle backends — prevents BFCArena VRAM fragmentation under concurrent ingest. Semaphore now acquired before TieredEngine branch in embed_text(). |
v0.11.78 |
May 2026 | TieredEngine pre-warms at server startup — eliminates 7–10 min first-request stall. ReembedJob batch ANN invalidation: one HNSW rebuild per cycle instead of O(N) (18× recall speedup). store_memory_batch() now routes through TieredEngine write path — GPU-free static ingest for batch operations. |
v0.11.77 |
May 2026 | SearchMode::Hybrid is now the server default. TieredEngine end-to-end fully wired — static→transformer upgrade pipeline. docker-compose.yml ships DAKERA_TIERED=1 + DAKERA_SEARCH_MODE=hybrid out of the box. |
v0.11.76 |
May 2026 | Binary HNSW overselect formula fixed — Recall@10 restored from 54% → ~100% for DAKERA_SEARCH_MODE=hybrid. SearchMode fallback corrected (unknown values → Float). |
v0.11.75 |
May 2026 | Pluggable inference backends (EmbeddingBackend trait): ONNX, Candle, GGUF, Static. StaticBackend Model2Vec ~500× ingest. TieredEngine. Binary HNSW. ModernBertEmbedBase 768d. Batch write method in all 4 SDKs (py/js/rs/go). |
v0.11.74 |
May 2026 | NER redesign: lowercase entity tags, entity_types honoured by all extractors, entity dedup, byte-offset span detection, MAX_TEXT_WORDS guard for long inputs. |
v0.11.73 |
May 2026 | Dual-layer timestamping — event date extracted at store time into _dakera_content_date metadata for temporal proximity scoring. SharesEntity knowledge-graph edge expansion for entity-linked cross-encoder recall. |
v0.11.72 |
May 2026 | temporal_rerank_multiplier raised 8×→12×. GLiNER span detection fix for contractions and multi-token entity names. |
v0.11.71 |
May 2026 | FP32 model for GPU/CUDA inference (fixes accuracy on CUDA EP). O(1) session lookup replaces O(N) scan. ONNX default batch size 8→32 for CPU deployments. |
v0.11.69 |
May 2026 | Parallel HNSW+BM25 in Hybrid mode re-introduced with full 1540Q bench gate — Cat4 tie-breaking regression resolved. 87.1% LoCoMo, Cat3 73.9%, Cat4 88.8%. |
v0.11.68 |
May 2026 | Revert parallel HNSW+BM25 (Cat4 RNG tie-breaking regression under tokio task interleaving). GPU CUDA Execution Provider enabled. |
v0.11.67 |
May 2026 | Cross-encoder overload gate (RERANKER_MAX_CONCURRENT=6) prevents queue-saturation cascades — graceful degradation to unranked results when gate fires, no client-side timeout storm. |
v0.11.66 |
May 2026 | Batch ONNX cross-encoder — mini-batch size RERANKER_ONNX_BATCH_SIZE=16 within each chunk; each mini-batch padded to its own max seq_len, reducing memory waste and improving throughput under concurrent rerank load. |
v0.11.65 |
May 2026 | Cross-encoder session pool (N=4) + adaptive chunk splitting — eliminates contention under concurrent recall. Each cross-encoder call gets a pooled session; chunks re-split for long documents exceeding model max seq_len. |
v0.11.64 |
May 2026 | Async metric recording off hot-path — pipeline stage metrics now recorded via tokio::spawn, removing synchronous atomic + alloc overhead from every recall. Fixes -0.7pp regression introduced in v0.11.63. |
v0.11.63 |
May 2026 | Full 8-stage recall pipeline instrumentation — Prometheus histograms for each stage (query parse → BM25 → vector → rerank → cross-encoder → dedupe → hydrate → response). Adds /metrics observability with no throughput cost. |
v0.11.62 |
May 2026 | GLiNER NER fully restored — text_lengths tensor reshaped rank-1→rank-2 so entity extraction works correctly after v0.11.61 deploy. |
v0.11.61 |
May 2026 | GLiNER model path fix — HuggingFace repos renamed (hyphen→underscore); updated to onnx-community/gliner_medium-v2.1. No HF token required. Restores dakera_auto_tag entity extraction. |
v0.11.60 |
May 2026 | HF_TOKEN support for HuggingFace model downloads — injects auth header on all download requests. Graceful fallback: no token → unauthenticated (existing behaviour for public models). |
v0.11.59 |
May 2026 | ONNX batch size 32→1 — eliminates LME 408 timeout caused by one 512-token text padding 31 peers to max seq_len. Each text now gets its own ONNX call. Session pool provides equivalent throughput. |
v0.11.58 |
May 2026 | ONNX session pool N=4 + parallel batch storage upsert — eliminates LME throughput bottleneck. Concurrent callers round-robin across pool slots. buffer_unordered(16) for parallel vector writes. |
v0.11.57 |
May 2026 | Docker base image rust:1.92→1.95 — [email protected] requires rustc ≥ 1.95. Fixes release build failures. |
v0.11.56 |
May 2026 | SDK engine parity: admin (cluster, maintenance, quotas, slow queries, backups), ops (diagnostics, jobs, compaction, shutdown), health probes, vector bulk ops (update, delete, count), fulltext stats/delete, TTL stats, storage tiers, memory type stats, agent consolidation, namespace entity config/extractor/dimension migration. All 4 SDKs (py/js/rs/go) at full parity. |
dakera-mcp v0.10.9 LATEST MCP |
Jun 2026 | v0.10.9 — T-I-F reliability scoring tool (dakera_tif_evaluate), ORCID founder attribution. Maintenance releases v0.10.4–v0.10.8 — stability improvements, updated tool descriptions. See GitHub Releases for full MCP changelog. |
dakera-mcp v0.10.2 |
May 2026 | Page size increased from 20 → 100: all profiles now return results in a single page. Reduces round-trips for agents with large memory stores. |
dakera-mcp v0.10.0 |
May 2026 | Profile-based tool tiering: 14 core tools by default, power/admin/all profiles for expanded access. Meta-discovery tools (discover_tools, load_tools). ~30K token savings vs loading all tools. |
v0.11.55 |
May 2026 | CE-118 temporal hybrid retrieval. Cat2 gate achieved (86.3%). 88.2% LoCoMo overall. |
v0.11.54 |
May 2026 | MCP: 14 core tools (86+ available via profiles). DAKERA_ENTITY_VECTOR_SEARCH enabled by default. Cross-encoder pipeline. |
v0.11.45 |
Apr 2026 | CE-71: ML routing classifier on by default. TEMPORAL_INFERENCE enabled. 87.1% LoCoMo. |
v0.11.27 |
Apr 2026 | HA cluster gossip stability improvements. MinIO throttle tuning. |
v0.10.2 |
Apr 2026 | Full stack release: server + py + js + rs + go SDKs simultaneously. |
SDK versions — SDKs track the server minor version. Current: Python / JS / Go / Rust at
v0.11.95, server at v0.11.95. SDKs and server are now version-locked. See the Introduction → for the full packages table.Release cadence
Dakera releases frequently — typically multiple versions per week during active development. All releases follow semantic versioning. Patch versions are always safe to upgrade to in-place.
Release artifacts