RAG-Augmented Memory
Category: Retrieval
Problem
Traditional RAG retrieves document chunks but lacks awareness of past interactions. Conversely, persistent memory knows user history but has no access to external knowledge bases. Agents need both — recalled memories for personalization and retrieved documents for factual grounding — merged into a single coherent context.
Architecture
This pattern runs two retrieval paths in parallel: Dakera recall fetches relevant memories (preferences, prior answers, facts learned over time) while an external document store provides fresh knowledge chunks. Results are merged, deduplicated, and ranked before injection into the LLM prompt.
Flow
- Receive user query
- Recall relevant memories from Dakera (user history, preferences, prior context)
- Retrieve document chunks from external vector store or search index
- Merge and rank results by relevance score
- Inject combined context into the system prompt
Implementation
from dakera import Dakera
client = Dakera(base_url="http://localhost:3300", api_key="dk-...")
def rag_augmented_recall(user_id: str, query: str, doc_chunks: list) -> dict:
"""Merge Dakera memory with external document retrieval."""
# Step 1: Recall from persistent memory
memory_results = client.memory.recall(
query=query,
namespace=f"user-{user_id}",
top_k=5
)
# Step 2: Combine memory results with document chunks
combined_context = []
# Add memories with source tag
for mem in memory_results["results"]:
combined_context.append({
"content": mem["content"],
"source": "memory",
"score": mem["score"],
"metadata": mem.get("metadata", {})
})
# Add document chunks with source tag
for chunk in doc_chunks:
combined_context.append({
"content": chunk["text"],
"source": "document",
"score": chunk["relevance_score"],
"metadata": {"doc_id": chunk["doc_id"]}
})
# Step 3: Sort by score and deduplicate
combined_context.sort(key=lambda x: x["score"], reverse=True)
# Step 4: Store the interaction for future recall
client.memory.store(
content=f"User asked: {query}",
namespace=f"user-{user_id}",
metadata={"type": "interaction", "docs_used": len(doc_chunks)}
)
return {"context": combined_context[:10], "memory_count": len(memory_results["results"])}
# Usage
result = rag_augmented_recall("alice", "How do I configure CORS?", external_chunks)
# Combines user's past CORS questions with fresh documentation
When to Use This Pattern
- Knowledge-base assistants that also remember user-specific context
- Customer support with both documentation and ticket history
- Research agents that combine personal notes with external sources
- Any system where personalization and factual accuracy are both required
Key Considerations
- Balance memory vs document weight — memories provide personalization, documents provide accuracy
- Deduplicate overlapping content to avoid wasting context window tokens
- Store successful retrievals back into memory for faster future recall
- Use metadata tags to distinguish memory sources in downstream processing