RAG-Augmented Memory

Category: Retrieval

Problem

Traditional RAG retrieves document chunks but lacks awareness of past interactions. Conversely, persistent memory knows user history but has no access to external knowledge bases. Agents need both — recalled memories for personalization and retrieved documents for factual grounding — merged into a single coherent context.

Architecture

This pattern runs two retrieval paths in parallel: Dakera recall fetches relevant memories (preferences, prior answers, facts learned over time) while an external document store provides fresh knowledge chunks. Results are merged, deduplicated, and ranked before injection into the LLM prompt.

Flow

Implementation

from dakera import Dakera

client = Dakera(base_url="http://localhost:3300", api_key="dk-...")

def rag_augmented_recall(user_id: str, query: str, doc_chunks: list) -> dict:
    """Merge Dakera memory with external document retrieval."""

    # Step 1: Recall from persistent memory
    memory_results = client.memory.recall(
        query=query,
        namespace=f"user-{user_id}",
        top_k=5
    )

    # Step 2: Combine memory results with document chunks
    combined_context = []

    # Add memories with source tag
    for mem in memory_results["results"]:
        combined_context.append({
            "content": mem["content"],
            "source": "memory",
            "score": mem["score"],
            "metadata": mem.get("metadata", {})
        })

    # Add document chunks with source tag
    for chunk in doc_chunks:
        combined_context.append({
            "content": chunk["text"],
            "source": "document",
            "score": chunk["relevance_score"],
            "metadata": {"doc_id": chunk["doc_id"]}
        })

    # Step 3: Sort by score and deduplicate
    combined_context.sort(key=lambda x: x["score"], reverse=True)

    # Step 4: Store the interaction for future recall
    client.memory.store(
        content=f"User asked: {query}",
        namespace=f"user-{user_id}",
        metadata={"type": "interaction", "docs_used": len(doc_chunks)}
    )

    return {"context": combined_context[:10], "memory_count": len(memory_results["results"])}

# Usage
result = rag_augmented_recall("alice", "How do I configure CORS?", external_chunks)
# Combines user's past CORS questions with fresh documentation

When to Use This Pattern

Key Considerations