Playbook v1.8
Chapter 16 Implementation 16 min read

Implementation — Three Ways to Start

You've read the architecture. You've seen the agent blueprints. Now: how do you actually deploy this?

We offer three paths, designed as a progression. Start wherever matches your team today. Upgrade when you're ready.

How To Build With AI — The 3-Phase Rule

Before you touch a single config file, a note on methodology. The most important lesson from 6+ months of building this system has nothing to do with OpenClaw or Claude or Shopify APIs. It's about how humans work with AI during the build.

After trying every variation, one pattern consistently produces working systems faster than anything else: the 3-phase rule — Research → Plan → Implement.

Credit: this methodology is adapted from Boris Tane's Claude Code workflow, battle-tested over months of real builds.

Phase 1 — Research

Before writing a single line of code or config, read deeply. Read the existing codebase. Read the relevant docs. Read the brain. Read every file that could possibly be affected. Document your findings in a research.md file.

The AI is good at reading fast. Use that. Ask it to summarize modules, trace dependencies, list assumptions. The written artifact forces the AI to verify its understanding before you trust it to plan.

Anti-pattern: skipping research because "it's a small change". The AI will happily generate plausible-looking code for a system it doesn't fully understand, and you'll spend 10× the saved time debugging.

Phase 2 — Plan

Based on the research, write a detailed plan.md. It should contain:

  • The approach (not the code)
  • File paths that will be touched
  • Code snippets for the critical paths
  • Trade-offs considered and rejected
  • Open questions that need human input

Critical: do not implement yet. Send the plan to your human reviewer. Let them annotate it inline — corrections, constraints, domain knowledge the AI didn't have. Iterate 1-6 times until the plan is approved.

This is where the real value happens. The annotation cycle is where human judgment meets AI speed. Skip it and you'll get working code that solves the wrong problem.

Phase 3 — Implement

Once the plan is approved, execute everything without stopping. Mark tasks completed as you go. No unnecessary comments, no jsdocs nobody asked for, no scope creep. Implementation should be boring — the creative work happened in Phase 2.

If something fails: revert and re-scope. Don't patch. Don't improvise. Go back to Phase 2, update the plan, then re-implement.

When To Skip The 3 Phases

  • Trivial fixes (typos, config tweaks, small bug fixes)
  • Urgent production issues (fix first, document the fix after)
  • Direct instructions with zero ambiguity ("change variable X from 5 to 10")

For everything else — especially new features, refactors, and multi-file changes — follow the full loop. It feels slower on the first step and is measurably faster by the third.

Why This Matters For Your Deployment

Every chapter of this playbook was written using the 3-phase rule. Every agent in production was deployed using the 3-phase rule. Every post-mortem in Ch.11b was the direct result of not following the 3-phase rule on some specific day.

When you start building your own deployment, resist the urge to go straight to "install OpenClaw". Start with a research doc. Write the plan. Get it annotated. Then build. You'll skip most of the lessons in Ch.11b the first time through.


Choose Your Path

┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│  Path A           Path B            Path C              Path D               │
│  ENTROPIC         ALPHACLAW         SELF-HOSTED CLI     MANAGED              │
│  (Desktop)        (Web Dashboard)   (Full Control)      (Full Service)       │
│                                                                              │
│  ⏱ 10 minutes     ⏱ 30 minutes      ⏱ 30 days            ⏱ We handle it     │
│  💰 $5 credits     💰 €20/mo VPS      💰 €60/month          💰 Custom           │
│  👤 Non-technical  👤 Semi-technical  👤 Tech-capable       👤 Any team         │
│  🤖 1 agent        🤖 1 agent         🤖 2-6+ agents        🤖 Full stack       │
│  🧠 Basic memory   🧠 Basic + Git     🧠 Full memory stack  🧠 Managed          │
│                                                                              │
│  "Try it"    →    "Deploy it"   →   "Own it"        →   "Scale it"           │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘
Path A: Entropic Path B: AlphaClaw Path C: Self-Hosted CLI Path D: Managed
Best for Founders, solo operators, proof of concept Small teams who want a VPS but no CLI Teams with a tech-capable person Brands that want results, not infrastructure
Time to first agent ~10 minutes ~30 minutes 2-7 days (production-ready in 30) We deploy in 1-2 weeks
Infrastructure Your laptop VPS (Railway/Render 1-click, or any Docker host) VPS (€20-40/mo) or Mac Mini We provide and manage
Setup experience Desktop app installer Web-based setup wizard (no CLI needed) Terminal + SSH + config files We handle onboarding
LLM billing Pay-as-you-go credits Your own API keys Your own API keys Included
Multi-agent Single agent Single agent Full multi-agent orchestration Full stack, pre-configured
Memory stack SuperMemory (basic) SuperMemory + Git-backed workspace Context Tree + SuperMemory + Knowledge Mining + Dedup Fully managed
Self-healing N/A ✅ Watchdog + crash recovery + auto-repair Manual (or custom scripts) We monitor 24/7
Observability In-app Web dashboard (files, logs, usage, webhooks) CLI tools + custom dashboards Full reporting
Messaging WhatsApp, Discord, Telegram, iMessage All channels + guided Telegram Topics setup All channels + cross-agent routing All channels
Ongoing maintenance Self-serve Auto Git sync + watchdog Self-managed (with our playbook) We handle it
When to upgrade When you need a persistent server When you need multi-agent or full memory stack When you want to focus on business, not ops

Path A: Entropic (Quick Start) — 10 Minutes to Your First Agent

Entropic is a desktop app built on the same OpenClaw engine that powers our full production stack. No terminal. No server. No API keys required (though you can bring your own).

Step 1: Download & Install (2 minutes)

Download from entropic.qu.ai/download. Available for macOS, Windows, and Linux.

Create an account. You get $0.50 in free credits — enough for your first conversation.

Step 2: Load Your Knowledge (5 minutes)

Create a workspace folder with the essentials:

my-brand-agent/
├── SOUL.md          # "You are [Brand]'s customer service agent..."
├── TOOLS.md         # Shopify token, any API keys you have
└── brain/
    └── knowledge/
        └── my-brand/
            ├── products.md      # Your catalog basics
            ├── policies.md      # Return/shipping/warranty
            └── faq.md           # Top 20 customer questions

You can write these manually or copy from your website. Even 3 files makes a dramatic difference vs. a blank agent.

Step 3: Connect a Channel (3 minutes)

Connect WhatsApp (scan QR code), Discord, Telegram, or iMessage. Now your agent responds where your team already works.

Step 4: Test & Iterate

Send it real questions from your last 20 customer tickets. Refine the knowledge files based on what it gets wrong.

When you're ready: Upgrade to Path B to add multiple specialized agents, the full memory stack, and production infrastructure.

What You Get

  • ✅ One AI agent answering customer questions
  • ✅ Connected to your messaging channels
  • ✅ Running on your machine (your data stays local)
  • ✅ Proof of concept to justify the full deployment

What You Don't Get (Yet)

  • ❌ Multiple specialized agents (CS + Ops + Finance)
  • ❌ Full Context Tree with Knowledge Mining
  • ❌ Automated memory dedup across agents
  • ❌ Cross-agent routing and orchestration
  • ❌ Shared brain across agent fleet
  • ❌ Cron jobs and scheduled automation

Path B: AlphaClaw (Web Dashboard) — 30 Minutes to a Persistent Agent

AlphaClaw wraps OpenClaw in a web-based setup wizard with built-in observability. No terminal after first deploy. Everything managed from a browser dashboard.

Think of it as "Entropic but on a server" — your agent runs 24/7 on a VPS instead of only when your laptop is open.

Step 1: One-Click Deploy (5 minutes)

Deploy to Railway or Render in one click:

  • Railway: Deploy → — Free tier available, scales with usage
  • Render: Deploy → — Free tier available

Or any Docker host:

npm install @chrysb/alphaclaw
npx alphaclaw start

Set a SETUP_PASSWORD at deploy time. Visit your deployment URL. Done.

Step 2: Setup Wizard (10 minutes)

The web wizard walks you through: 1. Model selection — Choose Claude Sonnet, GPT-4o, Gemini, etc. 2. Provider credentials — Paste your API key 3. Channel pairing — QR code for WhatsApp, token for Telegram/Discord 4. Workspace setup — Upload or create your SOUL.md and knowledge files

Step 3: Load Knowledge (15 minutes)

Use the built-in File Explorer (no SSH needed) to create and edit: - SOUL.md — Agent personality and rules - TOOLS.md — API credentials - Knowledge files in brain/knowledge/

The file explorer has inline editing, diff view, and Git-aware sync.

What You Get Over Entropic

Feature Entropic AlphaClaw
Runs 24/7 ❌ Only when laptop open ✅ Server-based
Self-healing ✅ Watchdog + crash recovery + auto-repair
Git backup ✅ Hourly auto-commits to GitHub
Webhooks ✅ Named endpoints with transforms
Google Workspace Manual setup ✅ Guided OAuth wizard (Gmail, Calendar, Drive)
File editing Local files ✅ Browser-based explorer with diffs
Prompt hardening ✅ Anti-drift bootstrap prompts injected automatically
Observability In-app ✅ Dashboard with usage, logs, health monitoring
Deploy options Desktop only ✅ Railway, Render, any Docker host

What You Don't Get (Yet)

  • ❌ Multiple specialized agents (CS + Ops + Finance)
  • ❌ Full Context Tree with Knowledge Mining + nightly dedup
  • ❌ Cross-agent routing and orchestration
  • ❌ Shared brain across agent fleet

When you're ready: Move to Path C for multi-agent production. AlphaClaw wraps OpenClaw without locking you in — remove it and your agent keeps running. Everything transfers.


Path C: Self-Hosted CLI (Production) — The 30-Day Plan

This is the full Compound Operations Model™. Multiple specialized agents, the complete memory stack, multi-channel routing, and autonomous optimization. Everything in this playbook.

Week 1: Foundation (Days 1-7)

Day 1-2: Infrastructure Setup

# 1. Get a server (Hetzner, DigitalOcean, or any VPS)
# Minimum: 4 vCPU, 8GB RAM, 80GB SSD, EU location
# Cost: €20-40/month

# 2. Install OpenClaw
npm install -g openclaw

# 3. Set up your first channel (WhatsApp recommended for testing)
openclaw init
# Follow the QR code pairing process

# 4. Get your LLM API key
# Recommended: Start with Claude Sonnet (best price/performance)
# https://console.anthropic.com/

# 5. Configure the basic agent
openclaw config
# Set model, channel, workspace directory

Day 3-4: Knowledge Base & Memory Architecture

This is the most important step and the one most people rush. Your agent is only as good as its knowledge.

Create the Context Tree in your workspace:

workspace/
├── SOUL.md                    # Agent personality and rules
├── TOOLS.md                   # API credentials and integration configs
├── MEMORY.md                  # Distilled knowledge (injected every prompt)
├── memory/                    # Daily session logs (auto-generated)
├── brain/
│   └── knowledge/
│       ├── _index.md              # Auto-generated root map
│       ├── your-brand/
│       │   ├── _index.md
│       │   ├── operations/
│       │   │   ├── products.md        # Product catalog
│       │   │   ├── policies.md        # Return, shipping, warranties
│       │   │   └── faq.md             # Top 30 customer Q&As
│       │   ├── team/
│       │   │   └── org-chart.md       # Who handles what
│       │   ├── marketing/
│       │   │   └── brand-voice.md     # Tone, vocabulary, do's/don'ts
│       │   ├── finance/               # Populated later (Week 3)
│       │   └── strategy/              # Populated later
│       └── platform/
│           ├── agents/
│           │   └── agents.md          # Agent roles and escalation
│           └── config/
│               └── rules.md           # Operational guardrails

Why the Context Tree matters: Without it, your agent searches 100+ flat files and might hallucinate. With it, a finance question goes straight to your-brand/finance/. Deterministic, not probabilistic. See Chapter 10b for the full architecture.

Day 3 — Populate core knowledge: - Export your product catalog from Shopify as structured markdown - Copy your website's FAQ page and format it as Q&A pairs - Write your policies as clear rules, not legal language - Include examples of good and bad responses in brand-voice.md

Day 4 — Set up memory layers: - Install SuperMemory: openclaw extensions install openclaw-supermemory - Generate _index.md for every directory (script in Ch. 10b) - Write your first MEMORY.md with business snapshot + team + key decisions - Test: ask your agent a domain question → verify it finds the right file

Day 5-7: First Agent (CS) in Shadow Mode

Deploy your CS agent in shadow mode: it processes every ticket but doesn't send anything. Instead, it drafts responses for human review.

What to monitor: - Are the responses accurate? (Product info, pricing, policies) - Is the tone right? (Brand voice, empathy, professionalism) - Are escalation decisions correct? (Does it know when to ask for help?) - What's it getting wrong? (These become knowledge base updates)

Week 2: Calibration (Days 8-14)

Day 8-10: Knowledge Base Refinement

Based on Week 1's shadow mode results:

  1. Fix every inaccuracy in the knowledge base
  2. Add missing information that caused failures
  3. Refine the SOUL.md based on tone misses
  4. Add edge cases you hadn't thought of

This is the step that separates a good deployment from a great one. Don't rush it.

Day 11-14: Gradual Autonomy

Start enabling autonomous responses for the simplest categories:

Day Autonomous Categories
11 "Where is my order?" (tracking queries only)
12 + Product availability / size questions
13 + Standard return requests (within policy)
14 + General FAQs

Monitor closely. Check every autonomous response for the first 2 days. If accuracy is > 95%, proceed. If not, go back to shadow mode for that category.

Week 3: Second Agent + Optimization (Days 15-21)

Day 15-17: Deploy Second Agent (Ops or Finance)

Choose based on your biggest pain point: - Ops Agent if inventory management is causing problems - Finance Agent if you're always behind on reporting

Follow the same shadow → calibrate → autonomy process.

Day 18-21: CS Agent Optimization

Your CS agent now has 2+ weeks of data. Analyze: - What % of tickets are handled autonomously? - What categories still need human review? - What's the CSAT score on auto-handled vs. human-handled? - Are there new patterns to add to the knowledge base?

Target by end of Week 3: 50-60% autonomous CS with >95% accuracy.

Week 4: Scale + Measure (Days 22-30)

Day 22-25: Connect Additional Integrations

Now that your core agents work, connect the data sources that make them smarter:

  • Google Analytics 4 (for marketing insights)
  • Klaviyo (for customer segments in CS)
  • Stripe (for financial data)
  • Google Sheets (for reporting dashboards)

Day 26-28: Reporting Framework

Set up automated reporting: - Daily: CS ticket summary, inventory alerts - Weekly: P&L (if Finance agent deployed), marketing performance - Monthly: Executive summary with all KPIs

Day 29-30: ROI Assessment

Calculate your actual ROI:

Hours saved per week:  ___ hours
× Equivalent hourly cost: €___/hour
= Weekly savings: €___

Monthly AI system cost: €___
Monthly savings: €___
ROI ratio: ___:1

Document everything. These numbers become your business case for expanding the system.

Common Implementation Mistakes

1. Skipping the Knowledge Base

"I'll just let the AI figure it out." No. Without accurate product data, policies, and brand voice, the agent will hallucinate. Garbage in, garbage out.

2. Going Full Autonomy Too Fast

Shadow mode feels slow. But one bad automated response to a customer costs more than a week of manual review. Build trust incrementally.

3. Not Updating the Knowledge Base

Your agent's knowledge was perfect on Day 1. By Day 30, you've launched new products, changed a policy, and hired a new person. If you don't update the knowledge base, the agent falls behind.

4. Using the Cheapest Model

Starting with the cheapest LLM to "save money" is false economy. Start with the best model, prove it works, then optimize costs. A bad customer response costs more than €0.50 in API tokens.

5. No Human Review Process

Even with 90% autonomy, someone needs to review the 10% of escalated tickets and periodically audit the autonomous ones. Build this into your team's workflow.

Autoresearch: The Autonomous Optimization Loop

Once your agents are running, the next frontier is autonomous iteration. Inspired by Karpathy's autoresearch — where an AI agent runs hundreds of ML training experiments overnight — we've adapted the pattern for operations.

The idea: give an agent a target file, a metric, and permission to iterate. It makes one change, measures the result, keeps improvements, discards regressions, and repeats. You wake up to a log of experiments and a better system.

How it works:

1. Define objective + metric (the "program.md")
2. Agent modifies target file (ONE hypothesis per iteration)
3. Run evaluation → measure metric
4. Better? Keep. Worse? Revert.
5. Repeat N times → produce report

Real applications we use:

Domain What it optimizes Metric
Email flows Klaviyo template HTML Open rate ↑
Landing pages Page HTML/CSS Lighthouse score ↑
Agent prompts SOUL.md / system prompts Task completion ↑
Ad copy Creative variants CTR ↑
Shopify theme Liquid sections Core Web Vitals ↓

The skill includes a program.md template and the full iteration loop logic. It's the same optimize→measure→keep pattern from ML research, applied to your operations.

→ See the included autoresearch skill in your Implementation Kit.

After Day 30: What's Next

If your first agent is working well, the path forward is clear:

  • Month 2: Deploy agents 2-3 (probably Ops + Finance)
  • Month 3: Deploy agents 4-5 (Marketing + Wholesale or Retail)
  • Month 4-6: Full stack, optimization, cross-agent intelligence
  • Month 6+: The system is self-improving. Your job is strategic oversight.

Path D: OperAI Managed — We Deploy It For You

Not everyone has 30 days and a tech-capable person to spare. That's what our managed service is for.

What We Do

  1. Discovery call — We understand your brand, channels, team, and pain points
  2. Knowledge extraction — We build your Context Tree from your existing docs, Shopify data, and team interviews
  3. Agent deployment — We configure, deploy, and calibrate your agent fleet
  4. Shadow period — We monitor every response for 2 weeks, refining the knowledge base daily
  5. Handoff — Your agents are production-ready. We stay on for ongoing maintenance or you take over

AutoResearch: Self-Improving Agents

Once agents are in production, they can optimize themselves. The AutoResearch system runs weekly:

  1. Mutate: Generate variations of the agent's core prompt (different phrasing, different emphasis, additional context)
  2. Evaluate: Test each variation against a fixed test set using a superior model as judge
  3. Deploy: If any mutation scores >5% better than the current prompt, deploy it automatically
  4. Safeguard: The mutator never sees evaluation criteria, preventing gaming

In production: the CS agent's accuracy improved from initial deployment to 94.7% through automated evolution. Cost: ~€35/week for 4 active optimization campaigns.

This means agents get better every week without human prompt engineering. The system learns what works and what doesn't from actual production performance.

LLM Council for Strategic Decisions

For high-stakes decisions (pricing strategy, market entry, store openings), a single AI opinion isn't enough. The LLM Council implements Karpathy's multi-model deliberation:

  1. 6 domain-expert AI advisors (strategy, CS, finance, retail, marketing, merchandising) each answer independently
  2. 3 blind peer reviewers score the anonymized responses — they don't know which expert wrote what
  3. A chairman model synthesizes all perspectives and scores into a final recommendation with confidence level

Cost: ~€1 per deliberation. Already used for real decisions in production. The blind review prevents "yes-man" bias — each expert argues for their domain's perspective.

Timeline

Week What Happens
1 Discovery + knowledge extraction + infrastructure setup
2 First agent (CS) deployed in shadow mode
3 CS goes autonomous + second agent deployed
4 Full stack calibrated + team onboarding

What's Included

  • Full multi-agent stack (CS, Ops, Finance, Marketing — as many as you need)
  • Complete memory architecture (Context Tree, SuperMemory, Knowledge Mining, nightly dedup)
  • All messaging channels configured (WhatsApp, Slack, email)
  • Shared brain across all agents
  • Cron jobs for reporting, maintenance, optimization
  • Monthly optimization reviews
  • Direct support channel

Who This Is For

Brands doing €2M–50M in revenue, running on Shopify, with a team of 10-50 people. You've seen the playbook, you know the ROI is there, but you don't want to build it yourself.

Contact: operai.ai — We'll get back to you within 24 hours.


The Upgrade Path

Most customers follow this natural progression:

Path A (Day 1)       Path B (Week 1)         Path C (Month 1-3)        Path D (Anytime)
──────────────   →   ──────────────────  →   ──────────────────────  →  ──────────────────
Desktop app          VPS with dashboard      Multi-agent full stack     We handle everything
Prove the concept    24/7 + self-healing     Memory stack + routing     Focus on your brand
$5                   €20/mo + API keys       €60/mo + API costs         Custom pricing

The beautiful thing about this architecture: everything you build carries forward. Your knowledge files, your SOUL.md, your SuperMemory data, your channel configs — they all transfer from one path to the next. Nothing wasted.

  • A → B: Copy your workspace folder to the VPS. AlphaClaw picks it up.
  • B → C: AlphaClaw wraps OpenClaw without lock-in. Remove it, your agent keeps running. Add more agents with CLI.
  • Any → D: We take your existing workspace and deploy the full stack around it.

And if you start with Path D, you get the same system described in this entire playbook, deployed and managed by the team that built it.

Ready to deploy this yourself?

The Implementation Kit has production templates, scripts, and a 30-day deployment calendar. Everything in this playbook — packaged to build with.

Get the Kit — €299 →