Implementation — Three Ways to Start
You've read the architecture. You've seen the agent blueprints. Now: how do you actually deploy this?
We offer three paths, designed as a progression. Start wherever matches your team today. Upgrade when you're ready.
How To Build With AI — The 3-Phase Rule
Before you touch a single config file, a note on methodology. The most important lesson from 6+ months of building this system has nothing to do with OpenClaw or Claude or Shopify APIs. It's about how humans work with AI during the build.
After trying every variation, one pattern consistently produces working systems faster than anything else: the 3-phase rule — Research → Plan → Implement.
Credit: this methodology is adapted from Boris Tane's Claude Code workflow, battle-tested over months of real builds.
Phase 1 — Research
Before writing a single line of code or config, read deeply. Read the existing codebase. Read the relevant docs. Read the brain. Read every file that could possibly be affected. Document your findings in a research.md file.
The AI is good at reading fast. Use that. Ask it to summarize modules, trace dependencies, list assumptions. The written artifact forces the AI to verify its understanding before you trust it to plan.
Anti-pattern: skipping research because "it's a small change". The AI will happily generate plausible-looking code for a system it doesn't fully understand, and you'll spend 10× the saved time debugging.
Phase 2 — Plan
Based on the research, write a detailed plan.md. It should contain:
- The approach (not the code)
- File paths that will be touched
- Code snippets for the critical paths
- Trade-offs considered and rejected
- Open questions that need human input
Critical: do not implement yet. Send the plan to your human reviewer. Let them annotate it inline — corrections, constraints, domain knowledge the AI didn't have. Iterate 1-6 times until the plan is approved.
This is where the real value happens. The annotation cycle is where human judgment meets AI speed. Skip it and you'll get working code that solves the wrong problem.
Phase 3 — Implement
Once the plan is approved, execute everything without stopping. Mark tasks completed as you go. No unnecessary comments, no jsdocs nobody asked for, no scope creep. Implementation should be boring — the creative work happened in Phase 2.
If something fails: revert and re-scope. Don't patch. Don't improvise. Go back to Phase 2, update the plan, then re-implement.
When To Skip The 3 Phases
- Trivial fixes (typos, config tweaks, small bug fixes)
- Urgent production issues (fix first, document the fix after)
- Direct instructions with zero ambiguity ("change variable X from 5 to 10")
For everything else — especially new features, refactors, and multi-file changes — follow the full loop. It feels slower on the first step and is measurably faster by the third.
Why This Matters For Your Deployment
Every chapter of this playbook was written using the 3-phase rule. Every agent in production was deployed using the 3-phase rule. Every post-mortem in Ch.11b was the direct result of not following the 3-phase rule on some specific day.
When you start building your own deployment, resist the urge to go straight to "install OpenClaw". Start with a research doc. Write the plan. Get it annotated. Then build. You'll skip most of the lessons in Ch.11b the first time through.
Choose Your Path
┌──────────────────────────────────────────────────────────────────────────────┐
│ │
│ Path A Path B Path C Path D │
│ ENTROPIC ALPHACLAW SELF-HOSTED CLI MANAGED │
│ (Desktop) (Web Dashboard) (Full Control) (Full Service) │
│ │
│ ⏱ 10 minutes ⏱ 30 minutes ⏱ 30 days ⏱ We handle it │
│ 💰 $5 credits 💰 €20/mo VPS 💰 €60/month 💰 Custom │
│ 👤 Non-technical 👤 Semi-technical 👤 Tech-capable 👤 Any team │
│ 🤖 1 agent 🤖 1 agent 🤖 2-6+ agents 🤖 Full stack │
│ 🧠 Basic memory 🧠 Basic + Git 🧠 Full memory stack 🧠 Managed │
│ │
│ "Try it" → "Deploy it" → "Own it" → "Scale it" │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
| Path A: Entropic | Path B: AlphaClaw | Path C: Self-Hosted CLI | Path D: Managed | |
|---|---|---|---|---|
| Best for | Founders, solo operators, proof of concept | Small teams who want a VPS but no CLI | Teams with a tech-capable person | Brands that want results, not infrastructure |
| Time to first agent | ~10 minutes | ~30 minutes | 2-7 days (production-ready in 30) | We deploy in 1-2 weeks |
| Infrastructure | Your laptop | VPS (Railway/Render 1-click, or any Docker host) | VPS (€20-40/mo) or Mac Mini | We provide and manage |
| Setup experience | Desktop app installer | Web-based setup wizard (no CLI needed) | Terminal + SSH + config files | We handle onboarding |
| LLM billing | Pay-as-you-go credits | Your own API keys | Your own API keys | Included |
| Multi-agent | Single agent | Single agent | Full multi-agent orchestration | Full stack, pre-configured |
| Memory stack | SuperMemory (basic) | SuperMemory + Git-backed workspace | Context Tree + SuperMemory + Knowledge Mining + Dedup | Fully managed |
| Self-healing | N/A | ✅ Watchdog + crash recovery + auto-repair | Manual (or custom scripts) | We monitor 24/7 |
| Observability | In-app | Web dashboard (files, logs, usage, webhooks) | CLI tools + custom dashboards | Full reporting |
| Messaging | WhatsApp, Discord, Telegram, iMessage | All channels + guided Telegram Topics setup | All channels + cross-agent routing | All channels |
| Ongoing maintenance | Self-serve | Auto Git sync + watchdog | Self-managed (with our playbook) | We handle it |
| When to upgrade | When you need a persistent server | When you need multi-agent or full memory stack | When you want to focus on business, not ops | — |
Path A: Entropic (Quick Start) — 10 Minutes to Your First Agent
Entropic is a desktop app built on the same OpenClaw engine that powers our full production stack. No terminal. No server. No API keys required (though you can bring your own).
Step 1: Download & Install (2 minutes)
Download from entropic.qu.ai/download. Available for macOS, Windows, and Linux.
Create an account. You get $0.50 in free credits — enough for your first conversation.
Step 2: Load Your Knowledge (5 minutes)
Create a workspace folder with the essentials:
my-brand-agent/
├── SOUL.md # "You are [Brand]'s customer service agent..."
├── TOOLS.md # Shopify token, any API keys you have
└── brain/
└── knowledge/
└── my-brand/
├── products.md # Your catalog basics
├── policies.md # Return/shipping/warranty
└── faq.md # Top 20 customer questions
You can write these manually or copy from your website. Even 3 files makes a dramatic difference vs. a blank agent.
Step 3: Connect a Channel (3 minutes)
Connect WhatsApp (scan QR code), Discord, Telegram, or iMessage. Now your agent responds where your team already works.
Step 4: Test & Iterate
Send it real questions from your last 20 customer tickets. Refine the knowledge files based on what it gets wrong.
When you're ready: Upgrade to Path B to add multiple specialized agents, the full memory stack, and production infrastructure.
What You Get
- ✅ One AI agent answering customer questions
- ✅ Connected to your messaging channels
- ✅ Running on your machine (your data stays local)
- ✅ Proof of concept to justify the full deployment
What You Don't Get (Yet)
- ❌ Multiple specialized agents (CS + Ops + Finance)
- ❌ Full Context Tree with Knowledge Mining
- ❌ Automated memory dedup across agents
- ❌ Cross-agent routing and orchestration
- ❌ Shared brain across agent fleet
- ❌ Cron jobs and scheduled automation
Path B: AlphaClaw (Web Dashboard) — 30 Minutes to a Persistent Agent
AlphaClaw wraps OpenClaw in a web-based setup wizard with built-in observability. No terminal after first deploy. Everything managed from a browser dashboard.
Think of it as "Entropic but on a server" — your agent runs 24/7 on a VPS instead of only when your laptop is open.
Step 1: One-Click Deploy (5 minutes)
Deploy to Railway or Render in one click:
Or any Docker host:
npm install @chrysb/alphaclaw
npx alphaclaw start
Set a SETUP_PASSWORD at deploy time. Visit your deployment URL. Done.
Step 2: Setup Wizard (10 minutes)
The web wizard walks you through: 1. Model selection — Choose Claude Sonnet, GPT-4o, Gemini, etc. 2. Provider credentials — Paste your API key 3. Channel pairing — QR code for WhatsApp, token for Telegram/Discord 4. Workspace setup — Upload or create your SOUL.md and knowledge files
Step 3: Load Knowledge (15 minutes)
Use the built-in File Explorer (no SSH needed) to create and edit:
- SOUL.md — Agent personality and rules
- TOOLS.md — API credentials
- Knowledge files in brain/knowledge/
The file explorer has inline editing, diff view, and Git-aware sync.
What You Get Over Entropic
| Feature | Entropic | AlphaClaw |
|---|---|---|
| Runs 24/7 | ❌ Only when laptop open | ✅ Server-based |
| Self-healing | ❌ | ✅ Watchdog + crash recovery + auto-repair |
| Git backup | ❌ | ✅ Hourly auto-commits to GitHub |
| Webhooks | ❌ | ✅ Named endpoints with transforms |
| Google Workspace | Manual setup | ✅ Guided OAuth wizard (Gmail, Calendar, Drive) |
| File editing | Local files | ✅ Browser-based explorer with diffs |
| Prompt hardening | ❌ | ✅ Anti-drift bootstrap prompts injected automatically |
| Observability | In-app | ✅ Dashboard with usage, logs, health monitoring |
| Deploy options | Desktop only | ✅ Railway, Render, any Docker host |
What You Don't Get (Yet)
- ❌ Multiple specialized agents (CS + Ops + Finance)
- ❌ Full Context Tree with Knowledge Mining + nightly dedup
- ❌ Cross-agent routing and orchestration
- ❌ Shared brain across agent fleet
When you're ready: Move to Path C for multi-agent production. AlphaClaw wraps OpenClaw without locking you in — remove it and your agent keeps running. Everything transfers.
Path C: Self-Hosted CLI (Production) — The 30-Day Plan
This is the full Compound Operations Model™. Multiple specialized agents, the complete memory stack, multi-channel routing, and autonomous optimization. Everything in this playbook.
Week 1: Foundation (Days 1-7)
Day 1-2: Infrastructure Setup
# 1. Get a server (Hetzner, DigitalOcean, or any VPS)
# Minimum: 4 vCPU, 8GB RAM, 80GB SSD, EU location
# Cost: €20-40/month
# 2. Install OpenClaw
npm install -g openclaw
# 3. Set up your first channel (WhatsApp recommended for testing)
openclaw init
# Follow the QR code pairing process
# 4. Get your LLM API key
# Recommended: Start with Claude Sonnet (best price/performance)
# https://console.anthropic.com/
# 5. Configure the basic agent
openclaw config
# Set model, channel, workspace directory
Day 3-4: Knowledge Base & Memory Architecture
This is the most important step and the one most people rush. Your agent is only as good as its knowledge.
Create the Context Tree in your workspace:
workspace/
├── SOUL.md # Agent personality and rules
├── TOOLS.md # API credentials and integration configs
├── MEMORY.md # Distilled knowledge (injected every prompt)
├── memory/ # Daily session logs (auto-generated)
├── brain/
│ └── knowledge/
│ ├── _index.md # Auto-generated root map
│ ├── your-brand/
│ │ ├── _index.md
│ │ ├── operations/
│ │ │ ├── products.md # Product catalog
│ │ │ ├── policies.md # Return, shipping, warranties
│ │ │ └── faq.md # Top 30 customer Q&As
│ │ ├── team/
│ │ │ └── org-chart.md # Who handles what
│ │ ├── marketing/
│ │ │ └── brand-voice.md # Tone, vocabulary, do's/don'ts
│ │ ├── finance/ # Populated later (Week 3)
│ │ └── strategy/ # Populated later
│ └── platform/
│ ├── agents/
│ │ └── agents.md # Agent roles and escalation
│ └── config/
│ └── rules.md # Operational guardrails
Why the Context Tree matters: Without it, your agent searches 100+ flat files and might hallucinate. With it, a finance question goes straight to your-brand/finance/. Deterministic, not probabilistic. See Chapter 10b for the full architecture.
Day 3 — Populate core knowledge: - Export your product catalog from Shopify as structured markdown - Copy your website's FAQ page and format it as Q&A pairs - Write your policies as clear rules, not legal language - Include examples of good and bad responses in brand-voice.md
Day 4 — Set up memory layers:
- Install SuperMemory: openclaw extensions install openclaw-supermemory
- Generate _index.md for every directory (script in Ch. 10b)
- Write your first MEMORY.md with business snapshot + team + key decisions
- Test: ask your agent a domain question → verify it finds the right file
Day 5-7: First Agent (CS) in Shadow Mode
Deploy your CS agent in shadow mode: it processes every ticket but doesn't send anything. Instead, it drafts responses for human review.
What to monitor: - Are the responses accurate? (Product info, pricing, policies) - Is the tone right? (Brand voice, empathy, professionalism) - Are escalation decisions correct? (Does it know when to ask for help?) - What's it getting wrong? (These become knowledge base updates)
Week 2: Calibration (Days 8-14)
Day 8-10: Knowledge Base Refinement
Based on Week 1's shadow mode results:
- Fix every inaccuracy in the knowledge base
- Add missing information that caused failures
- Refine the SOUL.md based on tone misses
- Add edge cases you hadn't thought of
This is the step that separates a good deployment from a great one. Don't rush it.
Day 11-14: Gradual Autonomy
Start enabling autonomous responses for the simplest categories:
| Day | Autonomous Categories |
|---|---|
| 11 | "Where is my order?" (tracking queries only) |
| 12 | + Product availability / size questions |
| 13 | + Standard return requests (within policy) |
| 14 | + General FAQs |
Monitor closely. Check every autonomous response for the first 2 days. If accuracy is > 95%, proceed. If not, go back to shadow mode for that category.
Week 3: Second Agent + Optimization (Days 15-21)
Day 15-17: Deploy Second Agent (Ops or Finance)
Choose based on your biggest pain point: - Ops Agent if inventory management is causing problems - Finance Agent if you're always behind on reporting
Follow the same shadow → calibrate → autonomy process.
Day 18-21: CS Agent Optimization
Your CS agent now has 2+ weeks of data. Analyze: - What % of tickets are handled autonomously? - What categories still need human review? - What's the CSAT score on auto-handled vs. human-handled? - Are there new patterns to add to the knowledge base?
Target by end of Week 3: 50-60% autonomous CS with >95% accuracy.
Week 4: Scale + Measure (Days 22-30)
Day 22-25: Connect Additional Integrations
Now that your core agents work, connect the data sources that make them smarter:
- Google Analytics 4 (for marketing insights)
- Klaviyo (for customer segments in CS)
- Stripe (for financial data)
- Google Sheets (for reporting dashboards)
Day 26-28: Reporting Framework
Set up automated reporting: - Daily: CS ticket summary, inventory alerts - Weekly: P&L (if Finance agent deployed), marketing performance - Monthly: Executive summary with all KPIs
Day 29-30: ROI Assessment
Calculate your actual ROI:
Hours saved per week: ___ hours
× Equivalent hourly cost: €___/hour
= Weekly savings: €___
Monthly AI system cost: €___
Monthly savings: €___
ROI ratio: ___:1
Document everything. These numbers become your business case for expanding the system.
Common Implementation Mistakes
1. Skipping the Knowledge Base
"I'll just let the AI figure it out." No. Without accurate product data, policies, and brand voice, the agent will hallucinate. Garbage in, garbage out.
2. Going Full Autonomy Too Fast
Shadow mode feels slow. But one bad automated response to a customer costs more than a week of manual review. Build trust incrementally.
3. Not Updating the Knowledge Base
Your agent's knowledge was perfect on Day 1. By Day 30, you've launched new products, changed a policy, and hired a new person. If you don't update the knowledge base, the agent falls behind.
4. Using the Cheapest Model
Starting with the cheapest LLM to "save money" is false economy. Start with the best model, prove it works, then optimize costs. A bad customer response costs more than €0.50 in API tokens.
5. No Human Review Process
Even with 90% autonomy, someone needs to review the 10% of escalated tickets and periodically audit the autonomous ones. Build this into your team's workflow.
Autoresearch: The Autonomous Optimization Loop
Once your agents are running, the next frontier is autonomous iteration. Inspired by Karpathy's autoresearch — where an AI agent runs hundreds of ML training experiments overnight — we've adapted the pattern for operations.
The idea: give an agent a target file, a metric, and permission to iterate. It makes one change, measures the result, keeps improvements, discards regressions, and repeats. You wake up to a log of experiments and a better system.
How it works:
1. Define objective + metric (the "program.md")
2. Agent modifies target file (ONE hypothesis per iteration)
3. Run evaluation → measure metric
4. Better? Keep. Worse? Revert.
5. Repeat N times → produce report
Real applications we use:
| Domain | What it optimizes | Metric |
|---|---|---|
| Email flows | Klaviyo template HTML | Open rate ↑ |
| Landing pages | Page HTML/CSS | Lighthouse score ↑ |
| Agent prompts | SOUL.md / system prompts | Task completion ↑ |
| Ad copy | Creative variants | CTR ↑ |
| Shopify theme | Liquid sections | Core Web Vitals ↓ |
The skill includes a program.md template and the full iteration loop logic. It's the same optimize→measure→keep pattern from ML research, applied to your operations.
→ See the included autoresearch skill in your Implementation Kit.
After Day 30: What's Next
If your first agent is working well, the path forward is clear:
- Month 2: Deploy agents 2-3 (probably Ops + Finance)
- Month 3: Deploy agents 4-5 (Marketing + Wholesale or Retail)
- Month 4-6: Full stack, optimization, cross-agent intelligence
- Month 6+: The system is self-improving. Your job is strategic oversight.
Path D: OperAI Managed — We Deploy It For You
Not everyone has 30 days and a tech-capable person to spare. That's what our managed service is for.
What We Do
- Discovery call — We understand your brand, channels, team, and pain points
- Knowledge extraction — We build your Context Tree from your existing docs, Shopify data, and team interviews
- Agent deployment — We configure, deploy, and calibrate your agent fleet
- Shadow period — We monitor every response for 2 weeks, refining the knowledge base daily
- Handoff — Your agents are production-ready. We stay on for ongoing maintenance or you take over
AutoResearch: Self-Improving Agents
Once agents are in production, they can optimize themselves. The AutoResearch system runs weekly:
- Mutate: Generate variations of the agent's core prompt (different phrasing, different emphasis, additional context)
- Evaluate: Test each variation against a fixed test set using a superior model as judge
- Deploy: If any mutation scores >5% better than the current prompt, deploy it automatically
- Safeguard: The mutator never sees evaluation criteria, preventing gaming
In production: the CS agent's accuracy improved from initial deployment to 94.7% through automated evolution. Cost: ~€35/week for 4 active optimization campaigns.
This means agents get better every week without human prompt engineering. The system learns what works and what doesn't from actual production performance.
LLM Council for Strategic Decisions
For high-stakes decisions (pricing strategy, market entry, store openings), a single AI opinion isn't enough. The LLM Council implements Karpathy's multi-model deliberation:
- 6 domain-expert AI advisors (strategy, CS, finance, retail, marketing, merchandising) each answer independently
- 3 blind peer reviewers score the anonymized responses — they don't know which expert wrote what
- A chairman model synthesizes all perspectives and scores into a final recommendation with confidence level
Cost: ~€1 per deliberation. Already used for real decisions in production. The blind review prevents "yes-man" bias — each expert argues for their domain's perspective.
Timeline
| Week | What Happens |
|---|---|
| 1 | Discovery + knowledge extraction + infrastructure setup |
| 2 | First agent (CS) deployed in shadow mode |
| 3 | CS goes autonomous + second agent deployed |
| 4 | Full stack calibrated + team onboarding |
What's Included
- Full multi-agent stack (CS, Ops, Finance, Marketing — as many as you need)
- Complete memory architecture (Context Tree, SuperMemory, Knowledge Mining, nightly dedup)
- All messaging channels configured (WhatsApp, Slack, email)
- Shared brain across all agents
- Cron jobs for reporting, maintenance, optimization
- Monthly optimization reviews
- Direct support channel
Who This Is For
Brands doing €2M–50M in revenue, running on Shopify, with a team of 10-50 people. You've seen the playbook, you know the ROI is there, but you don't want to build it yourself.
Contact: operai.ai — We'll get back to you within 24 hours.
The Upgrade Path
Most customers follow this natural progression:
Path A (Day 1) Path B (Week 1) Path C (Month 1-3) Path D (Anytime)
────────────── → ────────────────── → ────────────────────── → ──────────────────
Desktop app VPS with dashboard Multi-agent full stack We handle everything
Prove the concept 24/7 + self-healing Memory stack + routing Focus on your brand
$5 €20/mo + API keys €60/mo + API costs Custom pricing
The beautiful thing about this architecture: everything you build carries forward. Your knowledge files, your SOUL.md, your SuperMemory data, your channel configs — they all transfer from one path to the next. Nothing wasted.
- A → B: Copy your workspace folder to the VPS. AlphaClaw picks it up.
- B → C: AlphaClaw wraps OpenClaw without lock-in. Remove it, your agent keeps running. Add more agents with CLI.
- Any → D: We take your existing workspace and deploy the full stack around it.
And if you start with Path D, you get the same system described in this entire playbook, deployed and managed by the team that built it.
The Implementation Kit has production templates, scripts, and a 30-day deployment calendar. Everything in this playbook — packaged to build with.
Get the Kit — €299 →