Building Multi-Agent Memory Systems: Architecture Patterns

How to design memory architectures where multiple AI agents share context, maintain isolation, and collaborate through a unified memory layer.

The Multi-Agent Memory Problem

Modern AI applications rarely consist of a single agent. A customer support system might have a triage agent, a technical agent, and a billing agent. A coding assistant might have a planning agent, an implementation agent, and a review agent. Each agent needs its own working memory, but they also need to share relevant context.

Multi-agent shared memory architecture with namespace isolation between specialized agents

The naive approach — giving every agent access to every memory — creates noise. A billing agent doesn't need to see debugging sessions. A planning agent doesn't need to know about code formatting preferences. The challenge is designing a memory architecture that enables collaboration without creating an undifferentiated soup of context.

Pattern 1: Namespace Isolation with Shared Layers

The most common pattern uses separate namespaces for each agent's private memory, plus a shared namespace for cross-agent context:

from dakera import DakeraClient

client = DakeraClient(base_url="http://localhost:3300")

# Each agent has a private namespace
TRIAGE_NS = "agent-triage"
TECH_NS = "agent-technical"
BILLING_NS = "agent-billing"

# Plus a shared namespace for cross-agent context
SHARED_NS = "shared-customer-context"

# Triage agent stores its observations
client.store_memory(
    agent_id=TRIAGE_NS,
    content="Customer expressed frustration about repeated billing errors",
    metadata={"customer_id": "cust_42", "sentiment": "negative"}
)

# Triage also writes to shared for other agents
client.store_memory(
    agent_id=SHARED_NS,
    content="Customer cust_42 has ongoing billing issue, handle with care",
    metadata={"customer_id": "cust_42", "source_agent": "triage"}
)

When the billing agent picks up the conversation, it searches both its own namespace and the shared one:

# Billing agent retrieves context from both namespaces
private_context = client.search_memories(
    agent_id=BILLING_NS,
    query="customer cust_42 billing history",
    top_k=5
)

shared_context = client.search_memories(
    agent_id=SHARED_NS,
    query="customer cust_42",
    top_k=3,
    metadata_filter={"customer_id": "cust_42"}
)

When to Use This Pattern

Pattern 2: Session-Based Memory with Agent Scoping

When multiple agents collaborate on a single task in sequence (like a pipeline), sessions provide a natural boundary. Each session represents one unit of work, and agents contribute memories within that session:

# Create a session for this customer interaction
session = client.start_session(
    agent_id="support-pipeline",
    metadata={"customer_id": "cust_42", "ticket_id": "TKT-1234"}
)

# Triage agent adds to the session
client.store_memory(
    agent_id="support-pipeline",
    session_id=session.id,
    content="Initial diagnosis: billing discrepancy on invoice INV-5678",
    metadata={"agent": "triage", "step": 1}
)

# Technical agent picks up the session and adds its findings
client.store_memory(
    agent_id="support-pipeline",
    session_id=session.id,
    content="Root cause: proration calculation error during plan upgrade on March 3",
    metadata={"agent": "technical", "step": 2}
)

# Billing agent resolves using full session context
session_memories = client.search_memories(
    agent_id="support-pipeline",
    session_id=session.id,
    query="what happened with this billing issue",
    top_k=10
)

When to Use This Pattern

Pattern 3: Knowledge Graph as Shared Context

For complex multi-agent systems where relationships between entities matter more than individual memories, the knowledge graph becomes the shared layer. Each agent contributes entities and relationships, building a collective understanding:

# Research agent discovers a relationship
client.knowledge_graph.add_edge(
    namespace="company-intelligence",
    source={"type": "person", "name": "Alice Chen"},
    target={"type": "company", "name": "Acme Corp"},
    edge_type="works_at",
    metadata={"discovered_by": "research-agent", "confidence": 0.95}
)

# Sales agent adds deal context
client.knowledge_graph.add_edge(
    namespace="company-intelligence",
    source={"type": "company", "name": "Acme Corp"},
    target={"type": "deal", "name": "Enterprise Plan Q2"},
    edge_type="considering",
    metadata={"discovered_by": "sales-agent", "stage": "evaluation"}
)

# Any agent can traverse the graph
connections = client.knowledge_graph.traverse(
    namespace="company-intelligence",
    start={"type": "person", "name": "Alice Chen"},
    max_depth=2
)
# Returns: Alice -> works_at -> Acme Corp -> considering -> Enterprise Plan Q2

Dakera's knowledge graph supports 4 edge types that cover most agent collaboration scenarios: relates_to, works_at, part_of, and depends_on. Custom edge types can be defined for domain-specific relationships.

When to Use This Pattern

Pattern 4: Event-Sourced Memory

In systems where agents need to react to each other's discoveries in real-time, an event-sourced pattern works well. Each agent publishes memories as events, and other agents subscribe to relevant namespaces:

# Monitoring agent detects an anomaly
client.store_memory(
    agent_id="system-events",
    content="CPU usage on prod-server-3 exceeded 95% for 5 minutes",
    metadata={
        "event_type": "anomaly",
        "severity": "high",
        "source_agent": "monitor",
        "timestamp": "2026-05-16T14:32:00Z"
    }
)

# Diagnosis agent polls for new high-severity events
recent_events = client.search_memories(
    agent_id="system-events",
    query="high severity anomaly",
    metadata_filter={"severity": "high"},
    top_k=5,
    recency_weight=0.8  # Heavily weight recent events
)

Isolation Guarantees

Regardless of which pattern you choose, multi-agent memory requires strong isolation guarantees. A bug in one agent shouldn't corrupt another's context. Dakera provides isolation at multiple levels:

LevelScopeIsolation
CompanyEntire tenantSeparate data directory + encryption key
NamespaceAgent or domainSeparate HNSW index + BM25 index
SessionSingle task/conversationFiltered within namespace
MetadataCustom scopingQuery-time filtering

Production Deployment Patterns

Theory and production diverge quickly. Here are the patterns that work in practice for teams running multi-agent systems at scale.

The Hub-and-Spoke Topology

The most common production topology has a single Dakera instance serving all agents, with each agent type owning a dedicated namespace. Agents write to their private namespace and optionally to a shared "hub" namespace for cross-agent communication. This is simple to operate, easy to monitor, and scales to dozens of agents without coordination overhead:

from dakera import DakeraClient
import os

DAKERA_URL = os.environ["DAKERA_URL"]
DAKERA_KEY = os.environ["DAKERA_KEY"]

def get_client() -> DakeraClient:
    return DakeraClient(base_url=DAKERA_URL, api_key=DAKERA_KEY)

class ResearchAgent:
    PRIVATE_NS = "research-private"
    HUB_NS = "hub-intelligence"

    def __init__(self):
        self.client = get_client()

    def store_finding(self, content: str, customer_id: str):
        # Always write to private namespace
        result = self.client.store_memory(
            agent_id=self.PRIVATE_NS,
            content=content,
            metadata={"customer_id": customer_id, "agent": "research"}
        )
        # Promote high-importance findings to the shared hub
        if result.importance >= 0.7:
            self.client.store_memory(
                agent_id=self.HUB_NS,
                content=content,
                importance=result.importance,
                metadata={"customer_id": customer_id, "source_agent": "research",
                          "promoted": True}
            )

Importance-Weighted Promotion

Noisy shared namespaces are one of the most common causes of retrieval degradation in multi-agent systems. Use Dakera's importance scoring as a natural filter — only memories above a threshold are promoted to shared namespaces:

def promote_if_important(client: DakeraClient, content: str,
                          private_ns: str, shared_ns: str,
                          metadata: dict, threshold: float = 0.7):
    """Store privately always; promote to shared only if important enough."""
    result = client.store_memory(
        agent_id=private_ns,
        content=content,
        metadata=metadata
    )
    # Dakera computes importance based on content salience and novelty
    if result.importance >= threshold:
        client.store_memory(
            agent_id=shared_ns,
            content=content,
            importance=result.importance,
            metadata={**metadata, "promoted_from": private_ns}
        )
    return result

TypeScript SDK: Cross-Agent Coordination

For teams building with the TypeScript SDK in Node.js or Next.js agent frameworks:

import { DakeraClient } from '@dakera/sdk';

const client = new DakeraClient({
  baseUrl: process.env.DAKERA_URL!,
  apiKey: process.env.DAKERA_KEY!,
});

// Triage agent writes a handoff note to shared namespace
async function triageHandoff(sessionId: string, customerId: string,
                              summary: string): Promise {
  await client.storeMemory({
    agentId: 'shared-handoffs',
    content: summary,
    sessionId,
    metadata: {
      customer_id: customerId,
      source_agent: 'triage',
      handoff_time: new Date().toISOString(),
    },
    importance: 0.85,
  });
}

// Downstream agent reads the handoff before starting work
async function loadHandoffContext(sessionId: string, customerId: string) {
  const results = await client.searchMemories({
    agentId: 'shared-handoffs',
    query: `customer context for ${customerId}`,
    sessionId,
    topK: 5,
    metadataFilter: { customer_id: customerId },
  });
  return results.memories;
}

Failure Modes and Recovery Strategies

Multi-agent memory systems fail in specific, predictable ways. Understanding these failure modes in advance lets you build systems that degrade gracefully rather than catastrophically.

Failure Mode 1: Context Poisoning

A misbehaving agent writes incorrect or misleading information to a shared namespace. Downstream agents retrieve this poisoned context and produce wrong outputs, potentially cascading across the pipeline.

Prevention: Tag every memory with its source agent via metadata. Rate-limit writes to shared namespaces. Use importance scoring so low-confidence agent outputs don't pollute the shared layer.

Recovery: Dakera's batch forget API allows targeted deletion by metadata filter. If an agent produced bad output between timestamps T1 and T2, purge only those memories:

# Remove poisoned memories from a specific agent during an incident window
deleted = client.batch_forget(
    agent_id="shared-hub",
    metadata_filter={
        "source_agent": "malfunctioning-research-agent",
        "created_after": "2026-05-20T14:00:00Z",
        "created_before": "2026-05-20T16:30:00Z"
    }
)
print(f"Removed {deleted.count} poisoned memories")

Failure Mode 2: Namespace Sprawl

Teams add namespaces for every new agent, feature, or experiment without a cleanup strategy. After several months, the instance has hundreds of namespaces, many abandoned. Retrieval quality degrades because engineers lose track of which namespace contains what.

Prevention: Establish a naming convention early. Use a hierarchical format: {environment}/{domain}/{agent-role} — for example prod/support/triage or dev/research/web-scraper. Audit namespaces quarterly using the metrics endpoint to identify ones with zero searches in the past 30 days.

Failure Mode 3: Memory Accumulation Without Decay

A multi-agent system running without decay configuration will accumulate memories indefinitely. Search latency increases as the HNSW index grows. More critically, old irrelevant context competes with fresh context during ranking, degrading retrieval quality.

Prevention: Configure decay strategies per namespace from the start. Ephemeral working memory should use exponential decay with a 24-hour half-life. Long-term shared knowledge should use logarithmic decay or no decay at all. See the memory decay documentation for strategy selection guidance.

Failure Mode 4: Session Boundary Leakage

Reusing the same session ID across multiple distinct tasks blends memories from separate contexts. Always generate a fresh session ID (UUID v4) per distinct task unit — never reuse session IDs across customers or work units:

import uuid

# Generate a fresh session ID for each distinct task
session_id = str(uuid.uuid4())

result = client.store_memory(
    agent_id="support-pipeline",
    session_id=session_id,  # Fresh ID per customer interaction
    content="Customer complaint about invoice INV-9921",
    metadata={"customer_id": "cust_88"}
)

Failure Mode 5: Memory Server Unavailability

Agents that call memory operations when Dakera is briefly unavailable will receive connection errors. Without proper handling, this can halt agent pipelines entirely. Implement a circuit breaker so agents degrade to operating without memory (using only context window) rather than failing completely:

def safe_recall(client: DakeraClient, agent_id: str, query: str,
                top_k: int = 5) -> list:
    """Recall with graceful fallback when Dakera is unavailable."""
    try:
        return client.search_memories(
            agent_id=agent_id, query=query, top_k=top_k
        ).memories
    except (ConnectionError, TimeoutError) as e:
        # Log but don't raise — agent continues without memory context
        print(f"WARNING: Memory unavailable ({e}). Continuing without context.")
        return []

Scaling Multi-Agent Memory

Concurrent Access

Dakera handles concurrent reads and writes from multiple agents without locking at the namespace level. Each namespace maintains its own write-ahead log, so agents writing to different namespaces never contend. Agents writing to the same namespace experience serialized writes but parallel reads — which matches the typical access pattern where many agents read shared context but fewer write to it.

Memory Pruning

In multi-agent systems, memory accumulates fast. If 5 agents each store 100 memories per hour, you have 12,000 new memories per day. Dakera's decay strategies automatically reduce the relevance of old memories, and you can configure per-namespace retention:

# Configure aggressive decay for ephemeral agent working memory
client.namespace.configure(
    namespace="agent-triage-scratch",
    decay_strategy="exponential",
    decay_half_life_hours=24,
    max_memories=10000
)

# Configure slow decay for long-term shared knowledge
client.namespace.configure(
    namespace="shared-customer-context",
    decay_strategy="logarithmic",
    decay_half_life_hours=720,  # 30 days
    max_memories=500000
)

Real-World Example: Multi-Agent Code Review

Here's a complete example of three agents collaborating on code review using Dakera's memory:

from dakera import DakeraClient

client = DakeraClient(base_url="http://localhost:3300")

class SecurityAgent:
    NS = "review-security"

    def review(self, pr_diff: str, session_id: str):
        # Check memory for known vulnerability patterns
        past_vulns = client.search_memories(
            agent_id=self.NS,
            query=f"vulnerability patterns in {pr_diff[:200]}",
            top_k=5
        )

        # Store findings for other agents
        client.store_memory(
            agent_id="review-shared",
            session_id=session_id,
            content=f"Security review: no SQL injection risks, "
                    f"but found hardcoded timeout of 30s in retry logic",
            metadata={"agent": "security", "risk_level": "low"}
        )

class PerformanceAgent:
    NS = "review-performance"

    def review(self, pr_diff: str, session_id: str):
        # Reference shared findings from security agent
        security_notes = client.search_memories(
            agent_id="review-shared",
            session_id=session_id,
            query="security findings",
            top_k=3
        )

        # Add performance perspective
        client.store_memory(
            agent_id="review-shared",
            session_id=session_id,
            content=f"Performance review: the hardcoded 30s timeout "
                    f"(flagged by security) will cause connection pool exhaustion "
                    f"under load. Recommend configurable timeout with 5s default.",
            metadata={"agent": "performance", "risk_level": "medium"}
        )

class SummaryAgent:
    def summarize(self, session_id: str):
        # Pull all findings from the review session
        all_findings = client.search_memories(
            agent_id="review-shared",
            session_id=session_id,
            query="review findings and recommendations",
            top_k=20
        )
        return all_findings

Anti-Patterns to Avoid

Choosing the Right Pattern

Most production multi-agent systems combine patterns. A typical architecture uses:

  1. Namespace isolation for each agent's private working memory
  2. Sessions for pipeline-style collaboration within a single task
  3. Knowledge graph for long-lived entity relationships that span tasks
  4. Shared namespace with metadata filtering for cross-agent coordination

Start with the simplest pattern that meets your needs (usually Pattern 1), and add complexity only when you observe specific limitations. The memory architecture should reflect the communication topology of your agents — if two agents never need to share context, don't create infrastructure for it.

Next Steps

Once you have a multi-agent memory architecture designed, the operational details matter as much as the patterns. If you are self-hosting Dakera to serve your agent fleet, the complete self-hosting guide covers security hardening, TLS setup, Prometheus monitoring, and high-availability configuration. If your agents connect via MCP (Claude Desktop, Claude Code, Cursor, or Windsurf), the MCP setup guide walks through connecting multiple MCP clients to a single shared Dakera instance — enabling exactly the cross-tool memory sharing described in the patterns above.

For teams still evaluating whether a purpose-built memory engine like Dakera is the right choice compared to building on top of a vector database, the vector database vs. agent memory comparison covers the architectural trade-offs with concrete code examples showing the difference in implementation complexity.

Benchmark Context

Memory system performance matters significantly in multi-agent deployments where many agents compete for the same underlying store. Dakera's 88.2% LoCoMo score on the 2026 AI agent memory benchmark was achieved under concurrent multi-agent write load — not just single-threaded sequential read access. If your architecture has 10 or more agents writing to shared namespaces simultaneously, review the benchmark methodology to understand how retrieval quality is measured and independently validated under real concurrency conditions.

Build with Dakera

Give your AI agents persistent memory — self-hosted, production-ready, zero dependencies.

Stay in the loop
Get Dakera updates — releases, guides, and benchmarks. No spam.
✓ Subscribed. Thanks!