How Memory Works

Before writing your first memory, it helps to understand Dakera's core primitives. Once you understand agents, importance, and retrieval, the API will make intuitive sense.

Memory lifecycle

Store Active Decay Consolidate Archive (L3) Forget Memories flow through lifecycle stages — access resets decay, AutoPilot consolidates duplicates

Core primitives

Agents

An agent is a named memory namespace identified by a string you choose. All memories belong to an agent. Agents are isolated by default — a recall call on agent-A never returns memories from agent-B.

You can have as many agents as you need — one per user, one per pipeline stage, one per AI persona. There's no registration step; Dakera creates the namespace automatically the first time you store a memory for it.

Multi-agent access — To share memory across agents, use the cross-agent knowledge network API (/v1/knowledge/network/cross-agent). See Patterns & Recipes →

Memory types

Every memory has a memory_type that controls how it is used and how quickly it decays:

TypeUse forDecay rate
episodicSpecific events, user actions, conversation historyNormal (default)
semanticFacts, knowledge, learned patterns — more durableSlow
proceduralInstructions, skills, how-to knowledgeSlow
workingShort-term scratchpad, current task contextFast

Importance scoring

Every memory has an importance score from 0.0 to 1.0. Importance affects recall ranking, decay resistance, and batch filtering. When a recalled memory is accessed, Dakera automatically boosts its importance slightly.

ScoreWhen to useExamples
0.9 – 1.0Critical facts that must never be lostUser identity, API credentials context, safety rules, core preferences
0.7 – 0.8Important context worth preserving long-termProject goals, learned skills, key decisions, recurring patterns
0.5Standard observations (the default)Routine conversation facts, task completions, general notes
0.3 – 0.4Ephemeral or low-value stateCurrent task scratchpad, temporary context, status updates
0.1 – 0.2Noise — will decay quicklyGreeting messages, filler content, duplicates pending consolidation

Memory decay

Memories decay over time when not accessed. Rarely-recalled information fades; frequently-retrieved knowledge stays sharp. Configure the half-life via DAKERA_DECAY_HALF_LIFE_SECS (default 7 days). To expire a memory after a deadline, set expires_at to a Unix timestamp.

Sessions

A session groups related memories under a single ID — useful for scoping recall to a conversation, retrieving all memories from a specific interaction, and per-session deduplication.

Session-scoped recalls: Pass session_id in a recall request to restrict results to memories stored within that session. This is especially useful for multi-turn conversations where you want to recall only what was discussed in the current interaction, not the agent's entire history.

Sessions also support: auto-generated summaries when ended, metadata attachment (task type, user context), and listing all memories within a session via GET /v1/sessions/{id}/memories or dk session memories {id}.

Namespaces

Namespaces are the low-level isolation unit for vector data. The memory API routes to namespaces automatically via agent IDs — you only interact with namespaces directly when using the low-level vector search API.

Entity extraction

Dakera automatically extracts named entities from stored memories using a multi-provider pipeline. Extracted entities feed into the knowledge graph, enabling entity-centric retrieval and cross-memory linking.

ProviderHow it worksConfiguration
GLiNER (default)Zero-shot NER via on-device ONNX model — no API calls neededBuilt-in, always available
Rule-based pre-passRegex extraction for UUIDs, URLs, emails, dates, IPs before NERBuilt-in, always active
OpenAI / Anthropic / OllamaLLM-powered extraction via configurable provider hierarchyextractor_set per namespace

Configure per namespace what entity types to extract (person, organization, location, technology, custom types). Use entity_types_set to customize. Extracted entities are stored alongside the memory and queryable via memory_entities.

Knowledge graph

Every memory participates in a persistent entity graph. When a memory is stored, Dakera extracts entities and links the memory to other memories that share the same entities. The graph supports four edge types:

Edge typeCreated whenUse case
RelatedToTwo memories have high cosine similaritySemantic clustering and association
SharesEntityTwo memories mention the same named entityEntity-centric retrieval ("everything about Alice")
PrecedesTemporal ordering detected between eventsTimeline reconstruction and event chains
LinkedByExplicit user-created link via APICustom associations and annotations

Traverse the graph via graph_traverse (BFS from a root memory), find paths between memories via graph_path, or export the full graph as JSON/GraphML via kg_export. The cross-agent knowledge network (knowledge_network_cross_agent) spans multiple agents, visualizing shared entities and related memories across your entire system.

Consolidation

Over time, agents accumulate redundant or overlapping memories. Consolidation merges them into concise summaries.

MethodHow it worksWhen to use
ManualPass specific memory_ids to consolidateWhen you know which memories overlap
DeduplicationDBSCAN clustering finds near-duplicates (cosine ≥0.93), merges preserving highest importanceAfter bulk imports or high-volume agents
AutoPilotBackground task runs dedup + consolidation on a scheduleAlways-on maintenance for production systems

Trigger deduplication with knowledge_deduplicate (use dry_run: true to preview). AutoPilot runs automatically when enabled via DAKERA_AUTOPILOT_ENABLED=true.

Decay engine

Memories decay over time when not accessed — this mimics human memory, keeping frequently-used knowledge sharp while letting stale information fade. The decay engine supports six strategies:

StrategyBehavior
exponential (default)Importance halves every half_life period. Most natural for general use.
linearImportance decreases by a fixed amount per cycle. Predictable expiry.
stepImportance drops at defined thresholds. Good for tiered retention.
logarithmicFast initial decay, then slows. Keeps important memories longer.
noneNo decay — memories retain importance forever.
customPer-type decay curves via memory lifecycle policy.

Access resets decay — every time a memory is recalled, its last-accessed timestamp resets and importance gets a small boost. Frequently-accessed memories effectively never decay. Configure globally via DAKERA_DECAY_HALF_LIFE_SECS or per-namespace via memory_policy_set.

Spaced repetition — memories accessed at increasing intervals get additional decay resistance. The spaced repetition factor and base interval are configurable per namespace.

AutoPilot

AutoPilot is a background lifecycle manager that runs on a configurable interval (default: 1 hour). Each cycle performs:

  1. Deduplication — scans for near-duplicate memories and merges them
  2. Consolidation — clusters low-importance related memories and creates summaries
  3. Decay enforcement — applies the decay strategy, archiving memories below the minimum importance threshold to cold storage (L3)

Monitor AutoPilot via autopilot_status (shows last run timestamps, memories processed, dedup count). Force an immediate cycle with autopilot_trigger.

Memory feedback loop

Explicit feedback signals improve recall quality over time. After recalling a memory, your application can submit upvote, downvote, or flag signals:

Track feedback across an agent with agent_feedback_summary, or check individual memory feedback with memory_feedback_get. The feedback health endpoint (/v1/feedback/health) provides system-wide signal quality metrics.

The retrieval pipeline

Every recall request flows through an 8-step pipeline designed for maximum relevance:

StepWhat happensDetails
1. ClassifyML router categorizes the queryCategories: factual, multi-hop, temporal, comparison. Each routes to a different retrieval strategy optimized for that query type.
2. EmbedQuery is embedded on-deviceONNX model (MiniLM, BGE, or E5) generates a dense vector — zero external API calls.
3. Vector searchANN retrieval via configured indexHNSW, IVF, SPFresh, or Flat. Returns fetch_n candidates (configurable multiplier of top_k).
4. BM25 searchFull-text keyword matchRuns in parallel with vector search. Per-namespace BM25 index. Catches exact-match queries that vectors miss.
5. Entity vector searchSecond HNSW pass filtered by extracted entitiesOptional (enabled by default). Finds memories sharing entities with the query, merged via RRF.
6. Reciprocal Rank FusionMerge results from all retrieval pathsRRF with configurable k-parameter (DAKERA_RRF_K=60). Produces a unified ranking from vector + BM25 + entity results.
7. Temporal scoringApply decay weights and recency boostMultiplicative temporal factor adjusts scores based on memory age and access patterns.
8. Cross-encoder rerankingbge-reranker-base scores top candidatesONNX cross-encoder model re-scores the top candidates for precision. The final top_k results are returned.

The pipeline is fully configurable per request — override routing mode, disable reranking, set fusion weights, or scope to a specific session.