How Memory Works

Before writing your first memory, it helps to understand Dakera's core primitives. Once you understand agents, importance, and retrieval, the API will make intuitive sense.

Memory lifecycle

Core primitives

Agents

An agent is a named memory namespace identified by a string you choose. All memories belong to an agent. Agents are isolated by default — a recall call on agent-A never returns memories from agent-B.

You can have as many agents as you need — one per user, one per pipeline stage, one per AI persona. There's no registration step; Dakera creates the namespace automatically the first time you store a memory for it.

Multi-agent access — To share memory across agents, use the cross-agent knowledge network API (/v1/knowledge/network/cross-agent). See Patterns & Recipes →

Memory types

Every memory has a memory_type that controls how it is used and how quickly it decays:

Type	Use for	Decay rate
`episodic`	Specific events, user actions, conversation history	Normal (default)
`semantic`	Facts, knowledge, learned patterns — more durable	Slow
`procedural`	Instructions, skills, how-to knowledge	Slow
`working`	Short-term scratchpad, current task context	Fast

Importance scoring

Every memory has an importance score from 0.0 to 1.0. Importance affects recall ranking, decay resistance, and batch filtering. When a recalled memory is accessed, Dakera automatically boosts its importance slightly.

Score	When to use	Examples
`0.9 – 1.0`	Critical facts that must never be lost	User identity, API credentials context, safety rules, core preferences
`0.7 – 0.8`	Important context worth preserving long-term	Project goals, learned skills, key decisions, recurring patterns
`0.5`	Standard observations (the default)	Routine conversation facts, task completions, general notes
`0.3 – 0.4`	Ephemeral or low-value state	Current task scratchpad, temporary context, status updates
`0.1 – 0.2`	Noise — will decay quickly	Greeting messages, filler content, duplicates pending consolidation

Memory decay

Memories decay over time when not accessed. Rarely-recalled information fades; frequently-retrieved knowledge stays sharp. Configure the half-life via DAKERA_DECAY_HALF_LIFE_SECS (default 7 days). To expire a memory after a deadline, set expires_at to a Unix timestamp.

Sessions

A session groups related memories under a single ID — useful for scoping recall to a conversation, retrieving all memories from a specific interaction, and per-session deduplication.

Session-scoped recalls: Pass session_id in a recall request to restrict results to memories stored within that session. This is especially useful for multi-turn conversations where you want to recall only what was discussed in the current interaction, not the agent's entire history.

Sessions also support: auto-generated summaries when ended, metadata attachment (task type, user context), and listing all memories within a session via GET /v1/sessions/{id}/memories or dk session memories {id}.

Namespaces

Namespaces are the low-level isolation unit for vector data. The memory API routes to namespaces automatically via agent IDs — you only interact with namespaces directly when using the low-level vector search API.

Entity extraction

Dakera automatically extracts named entities from stored memories using a multi-provider pipeline. Extracted entities feed into the knowledge graph, enabling entity-centric retrieval and cross-memory linking.

Provider	How it works	Configuration
GLiNER (default)	Zero-shot NER via on-device ONNX model — no API calls needed	Built-in, always available
Rule-based pre-pass	Regex extraction for UUIDs, URLs, emails, dates, IPs before NER	Built-in, always active
OpenAI / Anthropic / Ollama	LLM-powered extraction via configurable provider hierarchy	`extractor_set` per namespace

Configure per namespace what entity types to extract (person, organization, location, technology, custom types). Use entity_types_set to customize. Extracted entities are stored alongside the memory and queryable via memory_entities.

Knowledge graph

Every memory participates in a persistent entity graph. When a memory is stored, Dakera extracts entities and links the memory to other memories that share the same entities. The graph supports four edge types:

Edge type	Created when	Use case
`RelatedTo`	Two memories have high cosine similarity	Semantic clustering and association
`SharesEntity`	Two memories mention the same named entity	Entity-centric retrieval ("everything about Alice")
`Precedes`	Temporal ordering detected between events	Timeline reconstruction and event chains
`LinkedBy`	Explicit user-created link via API	Custom associations and annotations

Traverse the graph via graph_traverse (BFS from a root memory), find paths between memories via graph_path, or export the full graph as JSON/GraphML via kg_export. The cross-agent knowledge network (knowledge_network_cross_agent) spans multiple agents, visualizing shared entities and related memories across your entire system.

Consolidation

Over time, agents accumulate redundant or overlapping memories. Consolidation merges them into concise summaries.

Method	How it works	When to use
Manual	Pass specific `memory_ids` to `consolidate`	When you know which memories overlap
Deduplication	DBSCAN clustering finds near-duplicates (cosine ≥0.93), merges preserving highest importance	After bulk imports or high-volume agents
AutoPilot	Background task runs dedup + consolidation on a schedule	Always-on maintenance for production systems

Trigger deduplication with knowledge_deduplicate (use dry_run: true to preview). AutoPilot runs automatically when enabled via DAKERA_AUTOPILOT_ENABLED=true.

Decay engine

Memories decay over time when not accessed — this mimics human memory, keeping frequently-used knowledge sharp while letting stale information fade. The decay engine supports six strategies:

Strategy	Behavior
`exponential` (default)	Importance halves every `half_life` period. Most natural for general use.
`linear`	Importance decreases by a fixed amount per cycle. Predictable expiry.
`step`	Importance drops at defined thresholds. Good for tiered retention.
`logarithmic`	Fast initial decay, then slows. Keeps important memories longer.
`none`	No decay — memories retain importance forever.
`custom`	Per-type decay curves via memory lifecycle policy.

Access resets decay — every time a memory is recalled, its last-accessed timestamp resets and importance gets a small boost. Frequently-accessed memories effectively never decay. Configure globally via DAKERA_DECAY_HALF_LIFE_SECS or per-namespace via memory_policy_set.

Spaced repetition — memories accessed at increasing intervals get additional decay resistance. The spaced repetition factor and base interval are configurable per namespace.

AutoPilot

AutoPilot is a background lifecycle manager that runs on a configurable interval (default: 1 hour). Each cycle performs:

Deduplication — scans for near-duplicate memories and merges them
Consolidation — clusters low-importance related memories and creates summaries
Decay enforcement — applies the decay strategy, archiving memories below the minimum importance threshold to cold storage (L3)

Monitor AutoPilot via autopilot_status (shows last run timestamps, memories processed, dedup count). Force an immediate cycle with autopilot_trigger.

Memory feedback loop

Explicit feedback signals improve recall quality over time. After recalling a memory, your application can submit upvote, downvote, or flag signals:

Upvote — increases the memory's importance score, making it rank higher in future recalls
Downvote — decreases importance, causing the memory to fade faster
Flag — marks the memory for review (visible in audit log and feedback summary)

Track feedback across an agent with agent_feedback_summary, or check individual memory feedback with memory_feedback_get. The feedback health endpoint (/v1/feedback/health) provides system-wide signal quality metrics.

The retrieval pipeline

Every recall request flows through an 8-step pipeline designed for maximum relevance:

Step	What happens	Details
1. Classify	ML router categorizes the query	Categories: factual, multi-hop, temporal, comparison. Each routes to a different retrieval strategy optimized for that query type.
2. Embed	Query is embedded on-device	ONNX model (MiniLM, BGE, or E5) generates a dense vector — zero external API calls.
3. Vector search	ANN retrieval via configured index	HNSW, IVF, SPFresh, or Flat. Returns `fetch_n` candidates (configurable multiplier of `top_k`).
4. BM25 search	Full-text keyword match	Runs in parallel with vector search. Per-namespace BM25 index. Catches exact-match queries that vectors miss.
5. Entity vector search	Second HNSW pass filtered by extracted entities	Optional (enabled by default). Finds memories sharing entities with the query, merged via RRF.
6. Reciprocal Rank Fusion	Merge results from all retrieval paths	RRF with configurable k-parameter (`DAKERA_RRF_K=60`). Produces a unified ranking from vector + BM25 + entity results.
7. Temporal scoring	Apply decay weights and recency boost	Multiplicative temporal factor adjusts scores based on memory age and access patterns.
8. Cross-encoder reranking	bge-reranker-base scores top candidates	ONNX cross-encoder model re-scores the top candidates for precision. The final `top_k` results are returned.

The pipeline is fully configurable per request — override routing mode, disable reranking, set fusion weights, or scope to a specific session.