How Memory Works
Before writing your first memory, it helps to understand Dakera's core primitives. Once you understand agents, importance, and retrieval, the API will make intuitive sense.
Memory lifecycle
Core primitives
Agents
An agent is a named memory namespace identified by a string you choose. All memories belong to an agent. Agents are isolated by default — a recall call on agent-A never returns memories from agent-B.
You can have as many agents as you need — one per user, one per pipeline stage, one per AI persona. There's no registration step; Dakera creates the namespace automatically the first time you store a memory for it.
/v1/knowledge/network/cross-agent). See Patterns & Recipes →Memory types
Every memory has a memory_type that controls how it is used and how quickly it decays:
| Type | Use for | Decay rate |
|---|---|---|
episodic | Specific events, user actions, conversation history | Normal (default) |
semantic | Facts, knowledge, learned patterns — more durable | Slow |
procedural | Instructions, skills, how-to knowledge | Slow |
working | Short-term scratchpad, current task context | Fast |
Importance scoring
Every memory has an importance score from 0.0 to 1.0. Importance affects recall ranking, decay resistance, and batch filtering. When a recalled memory is accessed, Dakera automatically boosts its importance slightly.
| Score | When to use | Examples |
|---|---|---|
0.9 – 1.0 | Critical facts that must never be lost | User identity, API credentials context, safety rules, core preferences |
0.7 – 0.8 | Important context worth preserving long-term | Project goals, learned skills, key decisions, recurring patterns |
0.5 | Standard observations (the default) | Routine conversation facts, task completions, general notes |
0.3 – 0.4 | Ephemeral or low-value state | Current task scratchpad, temporary context, status updates |
0.1 – 0.2 | Noise — will decay quickly | Greeting messages, filler content, duplicates pending consolidation |
Memory decay
Memories decay over time when not accessed. Rarely-recalled information fades; frequently-retrieved knowledge stays sharp. Configure the half-life via DAKERA_DECAY_HALF_LIFE_SECS (default 7 days). To expire a memory after a deadline, set expires_at to a Unix timestamp.
Sessions
A session groups related memories under a single ID — useful for scoping recall to a conversation, retrieving all memories from a specific interaction, and per-session deduplication.
Session-scoped recalls: Pass session_id in a recall request to restrict results to memories stored within that session. This is especially useful for multi-turn conversations where you want to recall only what was discussed in the current interaction, not the agent's entire history.
Sessions also support: auto-generated summaries when ended, metadata attachment (task type, user context), and listing all memories within a session via GET /v1/sessions/{id}/memories or dk session memories {id}.
Namespaces
Namespaces are the low-level isolation unit for vector data. The memory API routes to namespaces automatically via agent IDs — you only interact with namespaces directly when using the low-level vector search API.
Entity extraction
Dakera automatically extracts named entities from stored memories using a multi-provider pipeline. Extracted entities feed into the knowledge graph, enabling entity-centric retrieval and cross-memory linking.
| Provider | How it works | Configuration |
|---|---|---|
| GLiNER (default) | Zero-shot NER via on-device ONNX model — no API calls needed | Built-in, always available |
| Rule-based pre-pass | Regex extraction for UUIDs, URLs, emails, dates, IPs before NER | Built-in, always active |
| OpenAI / Anthropic / Ollama | LLM-powered extraction via configurable provider hierarchy | extractor_set per namespace |
Configure per namespace what entity types to extract (person, organization, location, technology, custom types). Use entity_types_set to customize. Extracted entities are stored alongside the memory and queryable via memory_entities.
Knowledge graph
Every memory participates in a persistent entity graph. When a memory is stored, Dakera extracts entities and links the memory to other memories that share the same entities. The graph supports four edge types:
| Edge type | Created when | Use case |
|---|---|---|
RelatedTo | Two memories have high cosine similarity | Semantic clustering and association |
SharesEntity | Two memories mention the same named entity | Entity-centric retrieval ("everything about Alice") |
Precedes | Temporal ordering detected between events | Timeline reconstruction and event chains |
LinkedBy | Explicit user-created link via API | Custom associations and annotations |
Traverse the graph via graph_traverse (BFS from a root memory), find paths between memories via graph_path, or export the full graph as JSON/GraphML via kg_export. The cross-agent knowledge network (knowledge_network_cross_agent) spans multiple agents, visualizing shared entities and related memories across your entire system.
Consolidation
Over time, agents accumulate redundant or overlapping memories. Consolidation merges them into concise summaries.
| Method | How it works | When to use |
|---|---|---|
| Manual | Pass specific memory_ids to consolidate | When you know which memories overlap |
| Deduplication | DBSCAN clustering finds near-duplicates (cosine ≥0.93), merges preserving highest importance | After bulk imports or high-volume agents |
| AutoPilot | Background task runs dedup + consolidation on a schedule | Always-on maintenance for production systems |
Trigger deduplication with knowledge_deduplicate (use dry_run: true to preview). AutoPilot runs automatically when enabled via DAKERA_AUTOPILOT_ENABLED=true.
Decay engine
Memories decay over time when not accessed — this mimics human memory, keeping frequently-used knowledge sharp while letting stale information fade. The decay engine supports six strategies:
| Strategy | Behavior |
|---|---|
exponential (default) | Importance halves every half_life period. Most natural for general use. |
linear | Importance decreases by a fixed amount per cycle. Predictable expiry. |
step | Importance drops at defined thresholds. Good for tiered retention. |
logarithmic | Fast initial decay, then slows. Keeps important memories longer. |
none | No decay — memories retain importance forever. |
custom | Per-type decay curves via memory lifecycle policy. |
Access resets decay — every time a memory is recalled, its last-accessed timestamp resets and importance gets a small boost. Frequently-accessed memories effectively never decay. Configure globally via DAKERA_DECAY_HALF_LIFE_SECS or per-namespace via memory_policy_set.
Spaced repetition — memories accessed at increasing intervals get additional decay resistance. The spaced repetition factor and base interval are configurable per namespace.
AutoPilot
AutoPilot is a background lifecycle manager that runs on a configurable interval (default: 1 hour). Each cycle performs:
- Deduplication — scans for near-duplicate memories and merges them
- Consolidation — clusters low-importance related memories and creates summaries
- Decay enforcement — applies the decay strategy, archiving memories below the minimum importance threshold to cold storage (L3)
Monitor AutoPilot via autopilot_status (shows last run timestamps, memories processed, dedup count). Force an immediate cycle with autopilot_trigger.
Memory feedback loop
Explicit feedback signals improve recall quality over time. After recalling a memory, your application can submit upvote, downvote, or flag signals:
- Upvote — increases the memory's importance score, making it rank higher in future recalls
- Downvote — decreases importance, causing the memory to fade faster
- Flag — marks the memory for review (visible in audit log and feedback summary)
Track feedback across an agent with agent_feedback_summary, or check individual memory feedback with memory_feedback_get. The feedback health endpoint (/v1/feedback/health) provides system-wide signal quality metrics.
The retrieval pipeline
Every recall request flows through an 8-step pipeline designed for maximum relevance:
| Step | What happens | Details |
|---|---|---|
| 1. Classify | ML router categorizes the query | Categories: factual, multi-hop, temporal, comparison. Each routes to a different retrieval strategy optimized for that query type. |
| 2. Embed | Query is embedded on-device | ONNX model (MiniLM, BGE, or E5) generates a dense vector — zero external API calls. |
| 3. Vector search | ANN retrieval via configured index | HNSW, IVF, SPFresh, or Flat. Returns fetch_n candidates (configurable multiplier of top_k). |
| 4. BM25 search | Full-text keyword match | Runs in parallel with vector search. Per-namespace BM25 index. Catches exact-match queries that vectors miss. |
| 5. Entity vector search | Second HNSW pass filtered by extracted entities | Optional (enabled by default). Finds memories sharing entities with the query, merged via RRF. |
| 6. Reciprocal Rank Fusion | Merge results from all retrieval paths | RRF with configurable k-parameter (DAKERA_RRF_K=60). Produces a unified ranking from vector + BM25 + entity results. |
| 7. Temporal scoring | Apply decay weights and recency boost | Multiplicative temporal factor adjusts scores based on memory age and access patterns. |
| 8. Cross-encoder reranking | bge-reranker-base scores top candidates | ONNX cross-encoder model re-scores the top candidates for precision. The final top_k results are returned. |
The pipeline is fully configurable per request — override routing mode, disable reranking, set fusion weights, or scope to a specific session.