Chapter 13 The Stack 15 min read

Memory Architecture — How Your Agents Remember Everything

The Problem No One Talks About

Most "AI agent" demos show you a chatbot that answers one question, forgets everything, and starts from zero next time. That's not an operating system. That's a goldfish with a nice UI.

Real operations demand institutional memory. Your CS agent needs to remember that customer who complained three times last month. Your finance agent needs to remember the payment term negotiation from February. Your ops agent needs to remember the supplier lead time change from last quarter.

Without memory, your agents are perpetual interns — capable, but always asking questions they should already know the answers to.

The Three-Layer Memory Stack

We run a three-layer architecture that mirrors how good organizations actually retain knowledge:

┌──────────────────────────────────────────────────┐
│          Layer 3: WORKING MEMORY                  │
│     memory/YYYY-MM-DD.md (daily session logs)     │
│     What happened today. Raw. Detailed.           │
├──────────────────────────────────────────────────┤
│          Layer 2: SEMANTIC MEMORY                  │
│     SuperMemory (cloud-native vector search)       │
│     Auto-indexed. Cross-session. Searchable.       │
├──────────────────────────────────────────────────┤
│          Layer 1: INSTITUTIONAL KNOWLEDGE          │
│     brain/knowledge/ (Context Tree)                │
│     Organized by domain. Curated. Authoritative.   │
└──────────────────────────────────────────────────┘

Each layer serves a different purpose:

Layer	What It Stores	How It Gets There	Who Uses It
Working Memory	Today's sessions, decisions, conversations	Pre-compaction flush (automatic)	Any agent reviewing recent history
Semantic Memory	Key facts, preferences, patterns	Auto-indexed per session	Any agent needing contextual recall
Institutional Knowledge	Policies, APIs, team info, playbooks	Curated (manual + automated mining)	All agents, always

Layer 1: The Context Tree

This is the foundation. Instead of dumping 100+ files in a flat directory and hoping the agent finds the right one, we organize knowledge as a domain hierarchy:

brain/knowledge/
├── _index.md                    ← Root map (auto-generated)
│
├── the-brand/                      ← Business domain
│   ├── _index.md
│   ├── finance/                 ← Salaries, P&L, equity, invoices
│   │   ├── _index.md
│   │   └── [15 files]
│   ├── strategy/                ← BP, competitive, legal, buying
│   │   └── [9 files]
│   ├── operations/              ← Shopify, 3PL, inventory, CS
│   │   └── [10 files]
│   ├── team/                    ← Org chart, meetings
│   │   └── [10 files]
│   └── marketing/               ← Weekly reports, SEO, PR
│       └── [13 files]
│
├── platform/                    ← Technical/infra domain
│   ├── agents/                  ← Agent configs, comms, setup
│   ├── auth/                    ← Credentials, access
│   ├── setup/                   ← Browser, bots, troubleshooting
│   └── config/                  ← Rules, lessons, hygiene
│
├── personal/                    ← Founder/operator context
└── projects/                    ← Side projects, R&D

Why This Matters

Without it: Agent gets a finance question → searches 100+ files → might find the right one, might hallucinate.

With it: Agent gets a finance question → knows to look in the-brand/finance/ → finds authoritative answer in seconds.

The _index.md files are auto-generated summaries that propagate upward. Change a file in finance/, the _index.md for finance/ and the-brand/ update automatically. This means every agent always has an accurate map of what knowledge exists and where.

Implementation

Every _index.md contains: - Domain description - File count - File listing with first heading extracted - Last update timestamp

# Finance Knowledge
**Files:** 15 | **Domain:** the-brand/finance

| File | Topic |
|------|-------|
| salarios-2026.md | Salary structure and ranges |
| bonus-policy.md | Performance bonus framework |
| invoice-pipeline.md | AP/AR automation pipeline |
...

Layer 2: Semantic Memory (SuperMemory)

⚠️ Production Update (March 2026): After 6 months in production, we removed SuperMemory from all agents. The reasons: - Memory pollution was severe despite nightly dedup (90+ duplicates/week/agent wasn't enough) - The vector search often surfaced irrelevant memories, degrading response quality - The Context Tree (Layer 1) + Working Memory (Layer 3) proved sufficient for operations - Cost/benefit didn't justify the maintenance overhead

Our recommendation: Start without SuperMemory. Add it later only if you find agents lacking cross-session context that the Context Tree doesn't cover. Most operational tasks don't need it — they need structured, current knowledge, not fuzzy recall.

SuperMemory is a cloud-native vector store that automatically indexes important information from every session. When an agent starts a new conversation, it gets a contextual recall of relevant memories.

What gets stored: - Key decisions ("We decided to use Amphora as our 3PL") - User preferences ("Diego prefers Slack for async, WhatsApp for urgent") - Entity facts ("The brand's gross margin is 67%") - Patterns ("Returns spike on Mondays after weekend orders")

How agents use it: - Every new session gets a supermemory_profile injection — what the system knows about this user - Agents can supermemory_search for specific context ("what did we decide about the Canada pricing?") - Old memories can be supermemory_forget when outdated

Key advantage: Works across all agents, all sessions, no local state. Agent on Server A can recall what Agent on Server B learned last week.

Multi-Agent SuperMemory

Every agent in the ecosystem has SuperMemory installed:

Aurelio  → SuperMemory (Hetzner)       ✓
Mafalda  → SuperMemory (Mac Mini)      ✓
Ferland  → SuperMemory (Mac Mini)      ✓
Gala     → SuperMemory (Mac Mini)      ✓
Donatello → SuperMemory (Mac Mini)     ✓
Olivia   → SuperMemory (Mac Mini)      ✓

This means the CS agent remembers that a VIP customer had a bad experience, even if the original complaint was handled by the hub agent three weeks ago.

Layer 2 (Replacement): ByteRover Native Memory Plugin

After removing SuperMemory, we evaluated several alternatives. ByteRover is now fully operational on all agents — brv CLI v2.6.0, running as a ContextEngine slot-exclusive plugin with safeguard compaction mode. Heartbeat every 30 minutes confirms health across the fleet. It solved the exact problems SuperMemory had — without the noise.

Why ByteRover Over SuperMemory

Problem with SuperMemory	ByteRover Solution
Vector store accumulates garbage	File-based Markdown hierarchy — human-readable, diffable, auditable
Dedup cron removed 90+/week but couldn't keep up	LLM-driven curation with UPSERT/MERGE operations — duplicates never enter
Semantic search surfaced irrelevant memories	4-tier retrieval: Cache → MiniSearch → LLM search → Recursive LLM (92.2% accuracy)
Opaque embeddings, no auditability	Plain Markdown files organized in semantic hierarchy
Broke when models changed	Model-agnostic — works with any LLM provider

How It Works

ByteRover integrates natively into OpenClaw's context assembly pipeline (via PR #50848). Instead of injecting memories as a separate step, it becomes part of the agent's prompt construction:

Agent receives message
  → ByteRover retrieves relevant knowledge (4-tier retrieval)
  → Injects into system prompt via ContextEngine.assemble()
  → Agent responds with full context
  → ByteRover curates new knowledge from the session

The Three Layers (ByteRover's Architecture)

ByteRover maintains three layers that map directly to our existing stack:

ByteRover Layer	Our Equivalent	What It Stores
Context Tree	`brain/knowledge/` (1,341 docs)	Deep structured knowledge, organized by domain
Workspace Memory	`MEMORY.md` + `SOUL.md`	Core rules, preferences, business snapshot
Daily Memory	`memory/YYYY-MM-DD.md`	Session notes, operational logs

The key insight: ByteRover didn't replace our architecture — it automated and improved what we were doing manually. Our Context Tree becomes ByteRover's Context Tree. Our MEMORY.md becomes Workspace Memory. Our daily logs become Daily Memory. Same structure, native integration.

Curation Operations

Instead of our custom Knowledge Mining cron extracting patterns with regex fallback, ByteRover provides five atomic operations with per-operation feedback:

Operation	What It Does	Example
`ADD`	Create new knowledge entry	"DHL Express rates changed to €4.50/package"
`UPDATE`	Replace existing content	Update salary ranges for 2026
`UPSERT`	Add if new, update if exists	Customer VIP threshold changed
`MERGE`	Intelligently combine entries	Combine two partial supplier records
`DELETE`	Remove stale entries	Remove last season's product catalog

Crash safety: All writes use writeFileAtomic() — if the process crashes mid-write, no corruption.

Feedback loop: If an operation fails, the LLM gets the error and can retry with a different approach. Our custom Knowledge Mining script had no feedback — it wrote and hoped.

4-Tier Retrieval Pipeline

This is the biggest upgrade over both SuperMemory (probabilistic vector search) and our brain_search (single-tier QMD):

Tier	Method	Speed	When Used
0	Cache lookup	<1ms	Repeated queries, recent context
1	MiniSearch full-text	<100ms	Keyword matches, exact terms
2	LLM-powered search	~1-2s	Semantic understanding needed
3	Recursive LLM Search	~3-5s	Complex, multi-hop questions

Result: 92.2% retrieval accuracy on LoCoMo and LongMemEval benchmarks. Most queries resolve at Tier 0-2 (under 100ms).

Out-of-domain detection: When a query falls outside stored knowledge, ByteRover explicitly says "I don't have information about this" instead of hallucinating. Critical for financial and CS responses.

Installation

Requires OpenClaw v2026.3.22+:

# Install globally
npm install -g @byterover/byterover

# Add to openclaw.json plugins
{
  "plugins": {
    "entries": {
      "@byterover/byterover": {
        "enabled": true,
        "config": {
          "ownsCompaction": false,
          "contextTree": {
            "basePath": "brain/knowledge",
            "enabled": true
          }
        }
      }
    }
  }
}

# Restart gateway
openclaw gateway run --force

ownsCompaction: false — delegates memory compaction to OpenClaw's native runtime. ByteRover handles curation; OpenClaw handles session management.

Multi-Agent Deployment

We deployed ByteRover across all 6 agents:

Agent	Host	ByteRover Status
Aurelio	Hetzner VPS	✅ Fully operational
Mafalda	Mac Mini	✅ Fully operational
Ferland	Mac Mini	✅ Fully operational
Gala	Mac Mini	✅ Fully operational
Donatello	Mac Mini	✅ Fully operational
Olivia	Mac Mini	✅ Fully operational

All agents share the same Context Tree via brain-sync (rsync every 30 minutes). ByteRover curates locally; rsync propagates globally. This means knowledge learned by the CS agent (Mafalda) becomes available to the Finance agent (Ferland) within 30 minutes.

What About Claude Code and Claude Desktop?

ByteRover is an OpenClaw plugin — it doesn't run inside Claude Code or Claude Desktop. But the improvement flows to all Claudes indirectly:

Better brain = better MCP queries. ByteRover curates the Context Tree that Claudes access via brain_search/brain_read. Higher quality knowledge → better answers.
Smarter agents = smarter agent_send. When you ask Ferland for a P&L via Claude Code, Ferland now has 4-tier retrieval backing its response.
Knowledge Mining → brain → MCP → everyone. The cycle: agents learn → ByteRover curates → brain-sync propagates → MCP serves → any Claude can query.

Claude Code has its own memory system (CLAUDE.md, conversation context) that serves the power-user interface well. ByteRover serves the always-on operational agents.

Cost

€0/month additional. ByteRover is a free npm package. The only incremental cost is the LLM tokens for curation operations (~€0.05-0.10/day per agent for Tier 2-3 retrieval). At scale across 6 agents: ~€10-18/month — negligible compared to the accuracy improvement.

Layer 3: Working Memory (Daily Logs)

Every session's work is captured in memory/YYYY-MM-DD.md files via pre-compaction flush. These are the raw operational logs:

# Memory — 2026-03-11

## Session 1: CS Ticket Review (~08:00 UTC)
- Processed 14 tickets: 9 auto-resolved, 3 escalated to the CS lead, 2 flagged for quality
- New pattern: 3 complaints about SS26 sizing on "Mia Blazer" → logged for product team
- Updated shipping policy knowledge (new DHL Express rates)

## Session 2: Weekly P&L (~10:00 UTC)
- Revenue W10: +12% vs W09
- COGS anomaly: leather supplier invoice 15% above PO → escalated to the finance manager
...

The Compaction Cycle (Live in Production)

Every day at 06:00 UTC, the Knowledge Mining cron job (/scripts/knowledge-mining.sh):

Reads the latest memory/YYYY-MM-DD.md file
Sends content to Claude Sonnet for intelligent extraction (with regex fallback if API unavailable)
Extracts durable patterns — categorized by domain: the-brand/finance/, the-brand/operations/, platform/agents/, etc.
Routes each finding to the correct folder, creating or appending to files
At 06:10 UTC, generate-index.sh rebuilds all 61 _index.md files across the Context Tree

Real output from first run: 8 durable findings extracted from a single day's memory log.

Before: Session logs pile up. Knowledge stays trapped in chat history. Nobody reads old logs.

After: Every day's operational intelligence gets distilled into the institutional knowledge base. The Context Tree grows organically from real operations.

memory/2026-03-11.md
  → extracts: "DHL Express rates changed"
    → routes to: the-brand/operations/shipping-rates.md
  → extracts: "Mia Blazer sizing complaints (3x)"
    → routes to: the-brand/operations/quality-alerts.md
  → updates: the-brand/operations/_index.md

The Shared Brain

In a multi-agent setup, you need all agents reading from the same knowledge base. We solve this with a shared brain via symlinks:

Server (Aurelio):
  /root/aurelio-v2/brain/knowledge/    ← Source of truth

Mac Mini (all other agents):
  /Users/Shared/shared-brain/knowledge/  ← Synced copy

  /Users/mafalda/mafalda/brain/ → /Users/Shared/shared-brain/
  /Users/ferland/ferland/brain/ → /Users/Shared/shared-brain/
  /Users/gala/gala/brain/ → /Users/Shared/shared-brain/
  ...

Sync mechanism: Bidirectional rsync via brain-sync.sh, running every 30 minutes: 1. Pull from Mac Mini first (captures agent-written knowledge) 2. Push from Hetzner (Aurelio as source of truth) 3. --update flag: most recent file wins conflicts 4. Filters: only .md/.json/.txt/.yaml — excludes credentials and PDFs

Health monitoring: brain-sync-health.sh runs every 2 hours, checking: - Mac Mini reachability via SSH - Sync freshness (alert if >2h stale) - Total doc count (currently 1,341 docs, 15MB)

Result: Update a policy on the hub agent → all agents see the change within minutes. No manual copying. No drift.

MEMORY.md — The Executive Summary

Every agent has a MEMORY.md injected into every prompt. This is the distilled, always-current summary:

# MEMORY.md — Distilled Knowledge
*Last updated: 11 Mar 2026*

## Business Snapshot
- Revenue trend: +51% YoY
- Growth target: +57% YoY
- Gross Margin: 67%
- Team: 20 people

## Knowledge Base
→ See brain/knowledge/_index.md for full map
- the-brand/finance/ — 15 files
- the-brand/operations/ — 10 files
- platform/agents/ — 8 files
...

## Key Decisions (Last 30 Days)
- Migrated 3PL to Amphora (Feb 2026)
- Launched wholesale agent (Mar 2026)
- Changed CS model from Opus → Sonnet (cost optimization)

This ensures every agent, every session, starts with current context — not a blank slate.

What Makes This Different

vs. RAG (Retrieval-Augmented Generation)

Most AI tools use RAG: chunk documents, embed them, retrieve on query. That works for search. It doesn't work for operations.

Our Context Tree is structured by domain, not by embedding similarity. When the finance agent needs salary data, it doesn't search a vector store and hope the right chunk surfaces. It goes to the-brand/finance/salarios-2026.md. Deterministic, not probabilistic.

vs. Fine-Tuning

Fine-tuning bakes knowledge into model weights. It's expensive, slow to update, and you lose it when you upgrade models. Our knowledge layer is model-agnostic — switch from Claude to GPT to Gemini and your institutional memory stays intact.

vs. Chat History

Most tools "remember" by stuffing old chat messages into context. That works until the context window fills up (and it fills up fast with operational data). Our three-layer stack means agents always have the right amount of context: MEMORY.md for essentials, SuperMemory for recall, Context Tree for deep lookup.

vs. ByteRover / Other Memory Tools

We evaluated ByteRover (popular on ClawHub) and took inspiration from its Context Tree concept. But we built it natively: - No external daemon dependency - Integrated with our multi-agent sync (shared brain) - Domain-organized for business operations, not just code - Knowledge Mining cron tailored to operational patterns

Setup Guide

Step 1: Create the Context Tree Structure

mkdir -p brain/knowledge/{your-brand}/{finance,operations,team,marketing,strategy}
mkdir -p brain/knowledge/platform/{agents,auth,setup,config}

Step 2: Populate Core Knowledge

Start with these critical files:

File	Content	Priority
`your-brand/operations/products.md`	Product catalog	Day 1
`your-brand/operations/policies.md`	Returns, shipping, warranty	Day 1
`your-brand/team/org-chart.md`	Who does what	Day 1
`your-brand/operations/faq.md`	Top 30 customer questions	Day 2
`your-brand/marketing/brand-voice.md`	Tone guidelines	Day 2
`platform/agents/agents.md`	Agent roles and configs	Day 3
`platform/config/rules.md`	Operational rules and guardrails	Day 3

Step 3: Generate Index Files

# Auto-generate _index.md for every directory
for dir in $(find brain/knowledge -type d); do
  count=$(find "$dir" -maxdepth 1 -name "*.md" ! -name "_index.md" | wc -l)
  echo "# $(basename $dir)" > "$dir/_index.md"
  echo "**Files:** $count" >> "$dir/_index.md"
  echo "" >> "$dir/_index.md"
  for f in "$dir"/*.md; do
    [ "$(basename $f)" = "_index.md" ] && continue
    heading=$(head -1 "$f" | sed 's/^# //')
    echo "- [$(basename $f)]($(basename $f)) — $heading" >> "$dir/_index.md"
  done
done

Step 4: Install SuperMemory

# On each agent
openclaw extensions install openclaw-supermemory
openclaw restart

Step 5: Set Up Knowledge Mining Cron

Create a daily cron job that: 1. Reads the latest memory/YYYY-MM-DD.md 2. Extracts durable decisions, new info, changed processes 3. Creates or updates files in the appropriate domain folder 4. Regenerates affected _index.md files

Step 6: Configure Shared Brain (Multi-Agent)

# On shared server
mkdir -p /shared/brain/knowledge
# Copy your Context Tree here

# On each agent's workspace
ln -s /shared/brain brain

Sync with: rsync -avz --delete source/ shared/brain/knowledge/

Memory Hygiene: Automated Deduplication

Here's the dirty secret of AI memory: it accumulates garbage. Every agent, every session, generates memories. Many are duplicates. "Diego is the CEO" gets stored 15 times with slight variations. API access facts repeat across sessions. Troubleshooting states from months ago linger like digital cobwebs.

Left unchecked, memory pollution degrades agent performance: - Irrelevant memories crowd out useful context - Duplicate facts consume recall slots - Stale operational state causes confusion

The Automated Solution

We run a nightly dedup cron on every agent. It's dead simple:

Schedule: 03:45 UTC daily (staggered per agent)
Type: Isolated agent turn (no human interaction)
Task: Search SuperMemory for duplicate/redundant entries,
      delete noise, keep most informative version

Cron job definition:

{
  "name": "SuperMemory Dedup",
  "schedule": { "kind": "cron", "expr": "45 3 * * *" },
  "sessionTarget": "isolated",
  "payload": {
    "kind": "agentTurn",
    "message": "Execute automated SuperMemory cleanup: Search for duplicate memories with high overlap. Focus on: multiple versions of same facts about people/roles/tools, temporal noise like troubleshooting states, redundant access permission statements. Delete duplicates keeping most informative version. Report stats briefly."
  },
  "delivery": { "mode": "none" }
}

Fleet Deployment Pattern

For multi-agent setups, deploy the dedup cron to all agents:

# For each agent on the shared host
for AGENT in mafalda gala ferland donatello olivia; do
  PORT=$(get_agent_port $AGENT)
  TOKEN=$(get_agent_token $AGENT)

  OPENCLAW_ALLOW_INSECURE_PRIVATE_WS=1 \
  OPENCLAW_GATEWAY_URL=ws://<host-ip>:$PORT \
  openclaw cron add \
    --name "SuperMemory Dedup" \
    --cron "45 3 * * *" \
    --session isolated \
    --message "Execute automated SuperMemory cleanup..." \
    --no-deliver \
    --token $TOKEN
done

Key gotcha: When deploying via SSH to agents on a shared host: - Use sudo -u <agent> bash -l -c '...' to inherit the correct PATH - Set OPENCLAW_ALLOW_INSECURE_PRIVATE_WS=1 for private ws:// connections - Agents bind to Tailnet IP, not localhost — use the correct IP:port

Results in Production

After 2 weeks of automated dedup across 6 agents: - ~90+ duplicate memories removed per agent per week - Common duplicates: identity facts (15+), API permissions (10+), troubleshooting logs (15+) - Agent recall accuracy improved — less noise means better signal - Zero human maintenance required

Device Pairing for Fleet Deployment

When deploying cron jobs remotely, you may need to pair the CLI with each agent's gateway. The pairing protocol uses Ed25519 signatures:

Each agent has identity/device.json with a keypair
The gateway maintains devices/paired.json with authorized devices
The CLI authenticates by signing a challenge with its private key

Critical detail: The publicKey in paired.json must be the raw 32-byte Ed25519 key (base64url, no padding) — NOT the full DER/PEM encoding. To extract:

import base64
# From PEM → DER → skip 12-byte SPKI header → raw key
der_bytes = base64.b64decode(pem_b64)
raw_key = der_bytes[-32:]
raw_b64url = base64.urlsafe_b64encode(raw_key).rstrip(b'=').decode()

This is a common gotcha — using the full DER key (44 bytes) instead of the raw key (32 bytes) causes "pairing required" errors even though the device appears paired.

Cost

The entire memory architecture costs €0/month in additional infrastructure. It's files on disk, organized intelligently. SuperMemory is included with OpenClaw. The Knowledge Mining cron uses the same LLM you're already paying for (~€0.10/day in API tokens for the daily digest).

Compare that to: - Pinecone: $70+/month - Weaviate Cloud: $25+/month - Custom RAG pipeline: Engineering time + vector DB costs

Summary

Component	Purpose	Cost	Setup Time
Context Tree	Structured institutional knowledge	€0	2-3 hours
~~SuperMemory~~	~~Semantic cross-session recall~~	~~Deprecated~~	—
ByteRover Native	4-tier retrieval + LLM curation	€0 (npm package)	10 minutes
Working Memory	Daily operational logs	€0	Automatic
Knowledge Mining	Auto-distill logs → knowledge	~€3/month	30 minutes
Shared Brain	Multi-agent knowledge sync	€0	1 hour
MEMORY.md	Executive context injection	€0	30 minutes

Total: ~€3/month for a complete institutional memory system.

The brands that build memory from day one will have an insurmountable advantage by month six. Their agents won't just be tools — they'll be the institutional memory of the organization, compounding every day.

Ready to deploy this yourself?

The Implementation Kit has production templates, scripts, and a 30-day deployment calendar. Everything in this playbook — packaged to build with.

Get the Kit — €299 →