DAKERA
Back to use cases

RAG + Persistent Memory

Traditional RAG retrieves from a static corpus. Dakera adds a learning layer — corrections stick, context accumulates across sessions, and stale information fades automatically through importance decay.


The problem

Traditional RAG retrieves documents from a static corpus — it doesn't learn from interactions. If a user corrects the system, asks a follow-up, or provides new context, that knowledge is lost after the session ends. The system makes the same mistakes repeatedly.

Consider an internal knowledge base assistant. A user asks about the API rate limit. The RAG system retrieves an outdated document that says "500 requests per minute." The user corrects it: "That changed in v2.3 — it's 1000 now." The correction happens in the chat, but the underlying retrieval system doesn't learn from it. The next user who asks the same question gets the same wrong answer.

The problem compounds over time. As documents age, the gap between what the corpus says and what's actually true widens. Without a mechanism to incorporate user feedback, corrections, and new information into the retrieval pipeline, RAG systems degrade rather than improve.

How Dakera solves it

Dakera sits alongside your existing RAG pipeline as a persistent memory layer. It stores interactions, corrections, and learned context — then includes them in future retrievals alongside your static corpus.

Static RAG vs. RAG + Dakera

Capability Static RAG RAG + Dakera
Retrieval method Vector similarity only Hybrid (HNSW + BM25 + cross-encoder)
Learns from corrections No Yes — stored as high-importance memories
Stale information Stays forever Fades via importance decay
Cross-session context Lost after session ends Persists and accumulates
Embedding dependency External API (OpenAI, etc.) Built-in (bge-large, 1024-dim)

Implementation

Here's how to add a persistent memory layer to an existing RAG pipeline:

from dakera import DakeraClient

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")

# User provides a correction — store it with high importance
client.store(
    agent_id="rag-assistant",
    content="User corrected: The API rate limit is 1000 req/min, not 500. Updated in v2.3 release notes.",
    importance=0.95,
    tags=["correction", "api-limits", "verified"]
)

# Next query combines RAG corpus + learned memories
memories = client.recall(
    agent_id="rag-assistant",
    query="What is the current API rate limit?",
    top_k=5
)
# Returns the user correction ranked high due to importance + recency

The pattern is straightforward: after your static RAG retrieval, also call client.recall() to pull relevant memories. Merge both result sets into your LLM context. Memories with high importance and recent timestamps naturally surface above stale corpus documents.

Storing learned context automatically

You don't need to wait for explicit corrections. Any interaction that produces useful context can be stored:

# After the LLM generates a response, store the Q&A pair
client.store(
    agent_id="rag-assistant",
    content=f"Q: {user_query}\nA: {llm_response}\nUser feedback: {feedback}",
    importance=0.6 if feedback == "helpful" else 0.3,
    tags=["qa-pair", "auto-captured"]
)

Decay keeps things clean: You don't need to manually curate memories. Low-importance auto-captured context fades naturally. High-importance corrections persist. Memories that are recalled frequently have their decay reset — the system self-curates based on actual usage patterns.

Deploy persistent memory for your agents

Self-hosted, no external API dependencies, production-ready. Add a learning layer to your RAG pipeline in under 10 minutes.