RAG-Augmented Memory

Category: Retrieval

Problem

Traditional RAG retrieves document chunks but lacks awareness of past interactions. Conversely, persistent memory knows user history but has no access to external knowledge bases. Agents need both — recalled memories for personalization and retrieved documents for factual grounding — merged into a single coherent context.

Architecture

This pattern runs two retrieval paths in parallel: Dakera recall fetches relevant memories (preferences, prior answers, facts learned over time) while an external document store provides fresh knowledge chunks. Results are merged, deduplicated, and ranked before injection into the LLM prompt.

Flow

  • Receive user query
  • Recall relevant memories from Dakera (user history, preferences, prior context)
  • Retrieve document chunks from external vector store or search index
  • Merge and rank results by relevance score
  • Inject combined context into the system prompt

Implementation

from dakera import Dakera

client = Dakera(base_url="http://localhost:3300", api_key="dk-...")

def rag_augmented_recall(user_id: str, query: str, doc_chunks: list) -> dict:
    """Merge Dakera memory with external document retrieval."""

    # Step 1: Recall from persistent memory
    memory_results = client.memory.recall(
        query=query,
        namespace=f"user-{user_id}",
        top_k=5
    )

    # Step 2: Combine memory results with document chunks
    combined_context = []

    # Add memories with source tag
    for mem in memory_results["results"]:
        combined_context.append({
            "content": mem["content"],
            "source": "memory",
            "score": mem["score"],
            "metadata": mem.get("metadata", {})
        })

    # Add document chunks with source tag
    for chunk in doc_chunks:
        combined_context.append({
            "content": chunk["text"],
            "source": "document",
            "score": chunk["relevance_score"],
            "metadata": {"doc_id": chunk["doc_id"]}
        })

    # Step 3: Sort by score and deduplicate
    combined_context.sort(key=lambda x: x["score"], reverse=True)

    # Step 4: Store the interaction for future recall
    client.memory.store(
        content=f"User asked: {query}",
        namespace=f"user-{user_id}",
        metadata={"type": "interaction", "docs_used": len(doc_chunks)}
    )

    return {"context": combined_context[:10], "memory_count": len(memory_results["results"])}

# Usage
result = rag_augmented_recall("alice", "How do I configure CORS?", external_chunks)
# Combines user's past CORS questions with fresh documentation

When to Use This Pattern

  • Knowledge-base assistants that also remember user-specific context
  • Customer support with both documentation and ticket history
  • Research agents that combine personal notes with external sources
  • Any system where personalization and factual accuracy are both required

Key Considerations

  • Balance memory vs document weight — memories provide personalization, documents provide accuracy
  • Deduplicate overlapping content to avoid wasting context window tokens
  • Store successful retrievals back into memory for faster future recall
  • Use metadata tags to distinguish memory sources in downstream processing