Changelog

Recent Dakera releases. Full history at GitHub Releases and GHCR tags.

Version	Date	Highlights
`v0.11.95` LATEST	Jun 2026	bge-reranker-v2-m3 cross-encoder upgrade (CE-C) — Reranker upgraded from `Xenova/bge-reranker-base` (278M, English-centric, ~49 nDCG@10 BEIR) to `onnx-community/bge-reranker-v2-m3-ONNX` INT8 (568M, multilingual, 51.8 nDCG@10). Improves ranking precision on multi-hop and temporal queries. Entity-linked memory boosting (CE-D) — off-by-default Mem0-style additive scoring boost for entity-matched candidates with popularity decay. BM25 English stemming regression guard — CI guards prevent silent revert of production English stemming. LoCoMo 87.1% overall (Cat1 86.5% / Cat2 85.0% / Cat3 69.6% / Cat4 90.0%) — all category gates pass.
`v0.11.94`	Jun 2026	gte-modernbert-base embedding model — `DAKERA_MODEL=gte-modernbert-base` switches to Alibaba-NLP/gte-modernbert-base (149M params, 768d, 8192 token window, MTEB retrieval 64.38 vs bge-large 64.23). MRL dimension truncation — `DAKERA_MRL_DIMENSION=256` post-pool truncates ModernBERT embeddings (64/128/256/512/768d supported); non-MRL models fall back silently. Fusion strategy override — `DAKERA_FUSION_STRATEGY=rrf` enables live MinMax/RRF A/B without code changes; unset = MinMax (default, byte-identical). Deploy time cut from 12 min to <5 min via PR image retag on merge.
`v0.11.93`	Jun 2026	Query decomposition — opt-in `DAKERA_QUERY_DECOMP=1` decomposes complex queries into sub-queries with RRF fan-out, improving multi-step recall. Security: quinn-proto 0.11.14 → 0.11.15 (RUSTSEC-2026-0185, CVSS 7.5).
`v0.11.92`	Jun 2026	Critical entity-extraction fix — `POST /v1/extract` now defaults `entity_types` to `[person, organization, location]` when not supplied, fixing playground returning 0 entities. GLiNER cache-gated to avoid cold-start 502s (graceful degradation to rule-based fallback).
`v0.11.91`	Jun 2026	Admin SDK parity — admin routes now served at `/v1/admin/` (SDK-canonical) in addition to `/admin/` (back-compat), fixing 404s from `client.autopilot_status()`, `client.decay_config()` etc. CE-36: compound recall score weights (`DAKERA_SCORE_W_VEC`, `DAKERA_SCORE_W_IMP`, `DAKERA_SCORE_W_REC`, `DAKERA_SCORE_RECENCY_TAU_HOURS`) are now runtime-configurable; defaults byte-identical to prior releases.
`v0.11.90`	Jun 2026	CE-32: list-aware sentence decomposition and supersession demotion both enabled by default (DAK-6507). Set `DAKERA_BATCH_SENTENCE_DECOMP=0` or `DAKERA_SUPERSEDE_DEMOTE=0` to opt out. LoCoMo 1540Q PASS — Overall 87.0%, Cat3 68.5%.
`v0.11.89`	Jun 2026	List-aware CE-31 sentence decomposition on batch ingest (DAK-6497) — buried-list gold recovered; LoCoMo byte-identical (session-gated). Supersession demotion hardened: session-coherent messages no longer falsely demoted. DFB-recall wired as type:ce PR gate, replaces LME smoke.
`v0.11.88`	Jun 2026	Opt-in CE-31 sentence decomposition on batch ingest path (DAK-6477, inert by default). Enables `DAKERA_CE31_DECOMPOSE=1` to activate.
`v0.11.87`	Jun 2026	Fix: honor `DAKERA_CROSS_SESSION_FETCH_MULT` in session-scoped recall branch (DAK-6477) — env override was silently ignored for session queries; now respected without changing LoCoMo defaults.
`v0.11.86`	Jun 2026	LME CE overhaul (DAK-6440) — 5 architectural recall fixes across Cat1–Cat4. Auto-dispatch LME smoke on PR Docker images. Dependency bumps: opendal 0.57.0, chrono 0.4.45, rusqlite 0.40.1, prost 0.14.4.
`v0.11.85`	Jun 2026	Env-gated LME fetch depth knobs (DAK-6423, inert by default): `DAKERA_HYBRID_FETCH_MULT` (Hybrid+rerank coarse multiplier, default 5×) and `DAKERA_CROSS_SESSION_FETCH_MULT` (cross-session query pool depth). LoCoMo recall path byte-identical when unset.
`v0.11.84`	Jun 2026	Entity vector search for temporal queries — Cat3 recall +4.4 pp (65.2% → 69.6%) via entity-filtered HNSW supplemental pass on BM25-routed temporal queries. Cross-encoder reranker queue fix — eliminates silent recall drop under concurrent load. Re-embed importance floor 0.0 — all static-tier memories now upgrade to ONNX embeddings.
`v0.11.83`	Jun 2026	Docker image variants — `:cpu` (INT8 quantised, CPU-only) and `:gpu`/`:cuda` (FP32 with CUDA base image). Deterministic HNSW build order — `DAKERA_HNSW_SEED` now effective across restarts. Raw-fs fast write path for `ObjectStorage::upsert` (~9× storage throughput). O(namespace) scan removed from per-batch list — eliminates per-store N² overhead.
`v0.11.82`	Jun 2026	Model2Vec static-write tier enabled in production (`DAKERA_TIERED=1`) — 9.7× ingest throughput. New `POST /admin/reembed/drain` endpoint for synchronous quality drain without waiting for background cycles. GLiNER entity extraction restored (FP32 model + correct span format). SHA-256 integrity guard for ONNX cold-boot with re-fetch-once on mismatch. Redundant legacy ONNX pool freed in tiered mode.
`v0.11.81`	Jun 2026	OOM hardening — `OnnxBackend` pool capped to 1 in GPU mode (eliminates BFCArena fragmentation from concurrent CUDA sessions), CPU memory pattern disabled (`.with_memory_pattern(false)`), OOM retry depth extended to batch=1. GPU forward passes serialised at allocator level via `parking_lot::Mutex`, replacing coarser `GPU_INFERENCE_SEMAPHORE`.
`v0.11.80`	May 2026	SIMD-accelerated HNSW distance (3–8× throughput on x86_64 + aarch64). GPU semaphore fix — `OnnxBackend` now serialises CUDA forward passes, eliminating `CUBLAS_STATUS_ALLOC_FAILED` under parallel ingest. CE-TORA8 confidence gate: Cat3 temporal recall 73.9% (+5.0 pp).
`v0.11.79`	May 2026	Global `GPU_INFERENCE_SEMAPHORE` (1 permit) serialises all CUDA forward passes across ONNX + Candle backends — prevents `BFCArena` VRAM fragmentation under concurrent ingest. Semaphore now acquired before `TieredEngine` branch in `embed_text()`.
`v0.11.78`	May 2026	`TieredEngine` pre-warms at server startup — eliminates 7–10 min first-request stall. `ReembedJob` batch ANN invalidation: one HNSW rebuild per cycle instead of O(N) (18× recall speedup). `store_memory_batch()` now routes through `TieredEngine` write path — GPU-free static ingest for batch operations.
`v0.11.77`	May 2026	`SearchMode::Hybrid` is now the server default. TieredEngine end-to-end fully wired — static→transformer upgrade pipeline. `docker-compose.yml` ships `DAKERA_TIERED=1` + `DAKERA_SEARCH_MODE=hybrid` out of the box.
`v0.11.76`	May 2026	Binary HNSW overselect formula fixed — Recall@10 restored from 54% → ~100% for `DAKERA_SEARCH_MODE=hybrid`. `SearchMode` fallback corrected (unknown values → Float).
`v0.11.75`	May 2026	Pluggable inference backends (`EmbeddingBackend` trait): ONNX, Candle, GGUF, Static. `StaticBackend` Model2Vec ~500× ingest. `TieredEngine`. Binary HNSW. `ModernBertEmbedBase` 768d. Batch write method in all 4 SDKs (py/js/rs/go).
`v0.11.74`	May 2026	NER redesign: lowercase entity tags, `entity_types` honoured by all extractors, entity dedup, byte-offset span detection, `MAX_TEXT_WORDS` guard for long inputs.
`v0.11.73`	May 2026	Dual-layer timestamping — event date extracted at store time into `_dakera_content_date` metadata for temporal proximity scoring. `SharesEntity` knowledge-graph edge expansion for entity-linked cross-encoder recall.
`v0.11.72`	May 2026	`temporal_rerank_multiplier` raised 8×→12×. GLiNER span detection fix for contractions and multi-token entity names.
`v0.11.71`	May 2026	FP32 model for GPU/CUDA inference (fixes accuracy on CUDA EP). O(1) session lookup replaces O(N) scan. ONNX default batch size 8→32 for CPU deployments.
`v0.11.69`	May 2026	Parallel HNSW+BM25 in Hybrid mode re-introduced with full 1540Q bench gate — Cat4 tie-breaking regression resolved. 87.1% LoCoMo, Cat3 73.9%, Cat4 88.8%.
`v0.11.68`	May 2026	Revert parallel HNSW+BM25 (Cat4 RNG tie-breaking regression under tokio task interleaving). GPU CUDA Execution Provider enabled.
`v0.11.67`	May 2026	Cross-encoder overload gate (`RERANKER_MAX_CONCURRENT=6`) prevents queue-saturation cascades — graceful degradation to unranked results when gate fires, no client-side timeout storm.
`v0.11.66`	May 2026	Batch ONNX cross-encoder — mini-batch size `RERANKER_ONNX_BATCH_SIZE=16` within each chunk; each mini-batch padded to its own max seq_len, reducing memory waste and improving throughput under concurrent rerank load.
`v0.11.65`	May 2026	Cross-encoder session pool (N=4) + adaptive chunk splitting — eliminates contention under concurrent recall. Each cross-encoder call gets a pooled session; chunks re-split for long documents exceeding model max seq_len.
`v0.11.64`	May 2026	Async metric recording off hot-path — pipeline stage metrics now recorded via `tokio::spawn`, removing synchronous atomic + alloc overhead from every recall. Fixes -0.7pp regression introduced in v0.11.63.
`v0.11.63`	May 2026	Full 8-stage recall pipeline instrumentation — Prometheus histograms for each stage (query parse → BM25 → vector → rerank → cross-encoder → dedupe → hydrate → response). Adds `/metrics` observability with no throughput cost.
`v0.11.62`	May 2026	GLiNER NER fully restored — `text_lengths` tensor reshaped rank-1→rank-2 so entity extraction works correctly after v0.11.61 deploy.
`v0.11.61`	May 2026	GLiNER model path fix — HuggingFace repos renamed (hyphen→underscore); updated to `onnx-community/gliner_medium-v2.1`. No HF token required. Restores `dakera_auto_tag` entity extraction.
`v0.11.60`	May 2026	HF_TOKEN support for HuggingFace model downloads — injects auth header on all download requests. Graceful fallback: no token → unauthenticated (existing behaviour for public models).
`v0.11.59`	May 2026	ONNX batch size 32→1 — eliminates LME 408 timeout caused by one 512-token text padding 31 peers to max seq_len. Each text now gets its own ONNX call. Session pool provides equivalent throughput.
`v0.11.58`	May 2026	ONNX session pool N=4 + parallel batch storage upsert — eliminates LME throughput bottleneck. Concurrent callers round-robin across pool slots. `buffer_unordered(16)` for parallel vector writes.
`v0.11.57`	May 2026	Docker base image rust:1.92→1.95 — `[email protected]` requires rustc ≥ 1.95. Fixes release build failures.
`v0.11.56`	May 2026	SDK engine parity: admin (cluster, maintenance, quotas, slow queries, backups), ops (diagnostics, jobs, compaction, shutdown), health probes, vector bulk ops (update, delete, count), fulltext stats/delete, TTL stats, storage tiers, memory type stats, agent consolidation, namespace entity config/extractor/dimension migration. All 4 SDKs (py/js/rs/go) at full parity.
`dakera-mcp v0.10.9` LATEST MCP	Jun 2026	v0.10.9 — T-I-F reliability scoring tool (`dakera_tif_evaluate`), ORCID founder attribution. Maintenance releases v0.10.4–v0.10.8 — stability improvements, updated tool descriptions. See GitHub Releases for full MCP changelog.
`dakera-mcp v0.10.2`	May 2026	Page size increased from 20 → 100: all profiles now return results in a single page. Reduces round-trips for agents with large memory stores.
`dakera-mcp v0.10.0`	May 2026	Profile-based tool tiering: 14 core tools by default, power/admin/all profiles for expanded access. Meta-discovery tools (`discover_tools`, `load_tools`). ~30K token savings vs loading all tools.
`v0.11.55`	May 2026	CE-118 temporal hybrid retrieval. Cat2 gate achieved (86.3%). 88.2% LoCoMo overall.
`v0.11.54`	May 2026	MCP: 14 core tools (86+ available via profiles). `DAKERA_ENTITY_VECTOR_SEARCH` enabled by default. Cross-encoder pipeline.
`v0.11.45`	Apr 2026	CE-71: ML routing classifier on by default. TEMPORAL_INFERENCE enabled. 87.1% LoCoMo.
`v0.11.27`	Apr 2026	HA cluster gossip stability improvements. MinIO throttle tuning.
`v0.10.2`	Apr 2026	Full stack release: server + py + js + rs + go SDKs simultaneously.

SDK versions — SDKs track the server minor version. Current: Python / JS / Go / Rust at v0.11.95, server at v0.11.95. SDKs and server are now version-locked. See the Introduction → for the full packages table.

Release cadence

Dakera releases frequently — typically multiple versions per week during active development. All releases follow semantic versioning. Patch versions are always safe to upgrade to in-place.

Release artifacts

GitHub Releases ↗ GHCR ↗ PyPI ↗ Helm (ArtifactHub) ↗

Stay updated

Get Dakera updates — releases, guides, and benchmarks. No spam.

✓ Subscribed. Thanks!