RAG-Augmented Memory

Category: Retrieval

Problem

Traditional RAG retrieves document chunks but lacks awareness of past interactions. Conversely, persistent memory knows user history but has no access to external knowledge bases. Agents need both — recalled memories for personalization and retrieved documents for factual grounding — merged into a single coherent context.

Architecture

This pattern runs two retrieval paths in parallel: Dakera recall fetches relevant memories (preferences, prior answers, facts learned over time) while an external document store provides fresh knowledge chunks. Results are merged, deduplicated, and ranked before injection into the LLM prompt.

Flow

Receive user query
Recall relevant memories from Dakera (user history, preferences, prior context)
Retrieve document chunks from external vector store or search index
Merge and rank results by relevance score
Inject combined context into the system prompt

Implementation

from dakera import Dakera

client = Dakera(base_url="http://localhost:3300", api_key="dk-...")

def rag_augmented_recall(user_id: str, query: str, doc_chunks: list) -> dict:
    """Merge Dakera memory with external document retrieval."""

    # Step 1: Recall from persistent memory
    memory_results = client.memory.recall(
        query=query,
        namespace=f"user-{user_id}",
        top_k=5
    )

    # Step 2: Combine memory results with document chunks
    combined_context = []

    # Add memories with source tag
    for mem in memory_results["results"]:
        combined_context.append({
            "content": mem["content"],
            "source": "memory",
            "score": mem["score"],
            "metadata": mem.get("metadata", {})
        })

    # Add document chunks with source tag
    for chunk in doc_chunks:
        combined_context.append({
            "content": chunk["text"],
            "source": "document",
            "score": chunk["relevance_score"],
            "metadata": {"doc_id": chunk["doc_id"]}
        })

    # Step 3: Sort by score and deduplicate
    combined_context.sort(key=lambda x: x["score"], reverse=True)

    # Step 4: Store the interaction for future recall
    client.memory.store(
        content=f"User asked: {query}",
        namespace=f"user-{user_id}",
        metadata={"type": "interaction", "docs_used": len(doc_chunks)}
    )

    return {"context": combined_context[:10], "memory_count": len(memory_results["results"])}

# Usage
result = rag_augmented_recall("alice", "How do I configure CORS?", external_chunks)
# Combines user's past CORS questions with fresh documentation

When to Use This Pattern

Knowledge-base assistants that also remember user-specific context
Customer support with both documentation and ticket history
Research agents that combine personal notes with external sources
Any system where personalization and factual accuracy are both required

Key Considerations

Balance memory vs document weight — memories provide personalization, documents provide accuracy
Deduplicate overlapping content to avoid wasting context window tokens
Store successful retrievals back into memory for faster future recall
Use metadata tags to distinguish memory sources in downstream processing