How We Built This

How we built our company a brain.
And what changed afterwards.

Not a pitch deck. Not a Twitter thread. Not a ChatGPT screenshot with a magic prompt. Just what happens when you build a company brain — 1,341 documents, 167 skills, 44 tools — and then let every employee, plus seven AI agents, run on top of it.

No vanity metrics. No "10x productivity." Just what works, what broke, and what we've learned after six months in production at an anonymized 8-figure consumer brand.

April 2026 · ~12 min read

Context: Why We Started

We run a consumer brand based in southern Europe. Around 40 people on the team. Omnichannel. Growing 50%+ per year. The classic scenario where every department needs one more hire and the budget doesn't stretch.

In late 2025 I built the first agent almost out of desperation: a personal assistant to help with daily chaos. It answered on whatsapp. It knew who was who on the team. It had calendar access. Nothing sophisticated, but it saved me 30 minutes a day of micro-decisions.

6+ months later we have 7 domain agents plus Claude Code connected to representative systems like Shopify, Klaviyo, and a helpdesk, plus the rest of the operating stack. A shared brain of 1,341 documents. 44 MCP tools. And a total cost of €352 per month.

The honest ROI, with every assumption on the table: 18:1.

What We Learned

1. Each Agent Solves One Problem, not "everything"

The most tempting mistake when you start with agents: give everything to one. "I want an agent that handles CS, generates financial reports, and also monitors inventory."

Doesn't work. It's like hiring one person and asking them to do accounting, customer service, and visual merchandising at the same time.

What works: one agent per domain. each with its own personality, its own tools, its own escalation rules, and its own autonomy threshold. The CS agent knows nothing about finance. The finance agent doesn't touch tickets. The retail agent has no access to meta ads.

The separation is what makes it robust. When something fails, you know exactly where to look.

2. Treat Them Like Employees, Not Software

Sounds cheesy but it's the most practical insight we have: each agent has an onboarding identical to a new hire.

It has a soul.md (its "employee handbook"): who it is, what it does, what it does NOT do, how it communicates, when it escalates. It has an assigned human manager. It has a 2-week trial period where everything it produces is reviewed before sending.

Autonomy is graduated:

Week 1-2: shadow mode (agent proposes, human decides)
Week 3: partial autonomy on routine tasks (tracking queries, stock checks)
Week 4+: full autonomy on everything above 95% confidence

After 6+ months, the ratio is 91% autonomous, 9% requires human review. and that 9% is genuinely complex: legal issues, angry VIP customers, situations that need human judgment.

3. The Brain Came First. Everything Else Followed.

The biggest lesson is the one we didn't see coming. The agents weren't the hard part. The hard part was extracting eighteen months of operational knowledge — refund policies, pricing exceptions, incident playbooks, vendor quirks, tone-of-voice patterns — out of email, Slack, and people's heads, and turning it into something a model could actually read.

That extraction became the brain: 1,341 markdown documents organized by domain (finance, operations, marketing, team, strategy), 167 executable skills, 44 integrated tools. Every day at 6am, a cron extracts patterns from the previous day's operations and writes them back into the brain. Indexes regenerate automatically. Knowledge syncs between hosts every 30 minutes.

Once the brain existed, two things happened. First, every employee suddenly had a useful AI — not because they got better at prompting, but because the model finally had context. Second, building agents became cheap. Each new agent reads from the brain, writes to the brain, contributes patterns the brain keeps.

Result: when the finance agent generates Monday's P&L, it already knows that last week's shipping spend went up 12% and has the context to explain why. When the CS agent receives a fit complaint, it already knows there have been 5 similar complaints this week and can alert the product team.

The brain is the moat. An agent that's been running for six months knows things no new hire would know in their first month — and that knowledge keeps compounding.

4. Expensive Models Aren't Always the Best Ones

We started with the most expensive model for everything. Opus for CS, opus for finance, opus for everything. €500/Month just on API.

Then we tried free models (qwen, kimi). They worked fine for 3 weeks until rate limits destroyed us during peak hours.

The solution we have now: 5 of 7 agents run on GPT-5.4 at €0 cost, piggybacking on chatgpt subscriptions the team already had. The 2 agents that touch customers (CS and HR) use anthropic sonnet for tone quality.

The insight: before buying API tokens, check what subscriptions your team already pays for. three people with chatgpt plus = three free agent slots on the best available model.

5. Agents Live Where Your Team Talks

We didn't build a special dashboard. There's no "agent console". Agents live in slack — the same place the team already works.

When the retail agent publishes the daily store report, it posts in #retail. When the finance agent detects an overdue invoice, it alerts in #finance. When the CS agent escalates a tricky ticket, it sends it to #cs with full context.

The team doesn't have to learn anything new. Agents are just another @mention in their existing channels.

This radically changes adoption. It's not "the CEO's AI system". It's a tool for the entire team.

The Agents

What each one actually does. No exaggeration.

🧠

strategy agent (the hub)

GPT-5.4 via chatgpt oauth · EU host

The COO of the swarm. Coordinates the rest, generates the daily morning briefing, runs competitive intelligence, and mines patterns from the brain. Also my personal assistant: manages calendar, prioritizes tasks, and tells me when an idea I have is mediocre.

This week: detected a competitor launching an adjacent product line and suggested moving up our next collection launch. Crossed exa search data with the marketing calendar and proposed 3 alternative dates.

💬

CS agent

claude sonnet 4 · anthropic API · secondary macOS host

Processes tickets from richpanel + whatsapp. Automatic triage, response drafts in the brand voice, pattern detection across complaints. Never responds directly to customers — generates drafts as internal notes that the CS lead reviews and sends.

20 Hours/week of CS work absorbed. 94.7% Accuracy on tracking responses. 60% Of tickets auto-resolved without human intervention.

📊

finance agent

GPT-5.4 via chatgpt oauth · secondary macOS host

Generates the weekly P&L every monday at 8am. Monitors accounts receivable. Reconciles invoices. Alerts when cash position drops below threshold. Processes incoming invoices automatically (email → OCR → drive → google sheets).

Caught that shipping costs had silently crept up 1.5% over 6 weeks. Cumulative impact: €28K/year. Nobody would have seen that in a monthly report.

📣

marketing agent

GPT-5.4 via chatgpt oauth · secondary macOS host

Analyzes klaviyo campaigns, meta ads, pinterest. Generates segmentation recommendations. Audits campaigns with 46 meta checks + 74 google checks (scored A-F). Monitors visibility in AI search engines (chatgpt, perplexity).

After analyzing 1,114 email campaigns: ALL CAPS subject lines generate 2.7x more revenue per recipient — but only when used in less than 15% of sends.

🏪

retail agent

GPT-5.4 via chatgpt oauth · secondary macOS host

Daily foot traffic + POS reports per store. Staffing recommendations based on traffic predictions. Inventory transfer alerts between locations.

Store A converts 2x better than store B, but store B has 22% higher average ticket. The agent detected the problem in store B is visual merchandising, not product — because the traffic-to-browse ratio was misaligned. Wouldn't have seen that without crossing TC analytics with shopify POS daily.

🧶

merch agent

GPT-5.4 via chatgpt oauth · secondary macOS host

Sell-through by category, inventory distribution by variant analysis, markdown candidates, price positioning vs competitors. Also handles wholesale: orders, invoicing, payment follow-ups.

Recommended markdown on 3 products with 9+ weeks of cover and declining velocity. The discount freed €1,892 in cash reinvested in restocking top sellers. Net positive.

👥

HR agent

claude sonnet 4 · anthropic API · secondary macOS host

Who's out today, payroll prep, vacation balances, expense categorization. Never sends emails directly. Never approves time off without CEO confirmation.

Since the HR system doesn't have an absence API, we built a microservice that scrapes the web UI with cookie sessions. Hacky but it works. The MCP wraps it as a clean tool — users call holded_leaves("today") and get JSON back, never knowing about the scraper underneath.

The Cost

What we pay per month. Everything included. No fine print.

item	€/mo
claude pro max (founder interface)	185
anthropic API (CS + HR + fallbacks + haiku crons)	80
secondary macOS host M4 amortized (€800 / 36 months)	22
chatgpt subs (3 employees already had them)	0
EU host (8 vCPU, 16 GB)	15
tailscale premium	17
cloudflare tunnel + vercel	0
total	352

Annual cost: €4,224. Value in hours saved: €77,584 (62 hours/week × loaded labor cost). Ratio: 18:1.

We don't include revenue impact because it's hard to defend in a serious conversation. Do the optimized email campaigns generate more? Yes. Does better-distributed inventory prevent stockouts? Yes. But attributing exact euros to those improvements is speculative. So the official ROI is just hours saved. 18:1 Is enough.

What Broke

A selection from the 32 documented production lessons:

Day 1: 4 of 6 agents went down simultaneously. One agent burned through a shared token's rate limit and the cascade took down the others. Rule: never share tokens between agents.
Week 3: the CS agent started degrading response quality. Semantic memory was accumulating garbage (the same info stored 15 times with slight variations). We removed SuperMemory from all agents and replaced it with a curated markdown file system.
Month 2: tried free-tier models. They worked for 3 weeks until rate limits destroyed us. Pivoted to chatgpt oauth: €0 and no rate limits.
A random tuesday: systemd + openclaw = infinite restart loop. Openclaw forks a child process and the parent exits with code 1. Systemd interprets exit 1 as failure, kills the child, restarts. Loop. Rule: on linux use cron @reboot, never systemd.
2 Weeks ago: richpanel (our helpdesk) started returning 403 on all API calls. Same headers, same code, nothing changed on our end. The vendor silently changed something. Regenerated the key and it worked. Lesson: every external API needs a periodic health check.

All 32 lessons are documented in the free playbook: useoperai.com/playbook

What Comes Next

The Playbook Is Free and Updates Daily

Everything we've learned is at useoperai.com/playbook — 32 chapters, 32 production lessons, 15+ advanced capabilities. It is operating documentation backed by a live system, not a static PDF.

Pattern Library: The Network Effect

Every deployment contributes anonymized operational patterns to a shared library. The first brand that deploys the system takes 3 months to reach 91% autonomy. The fifth one starts at 70% from day one. The twentieth at 85%. The real moat isn't the technology — it's the accumulated operational intelligence.

The Implementation Kit

If you want to build this for your brand: templates for all 7 domain agents, production scripts, reference architecture, deployment contract, and the activation path. €299, lifetime access. useoperai.com/#pricing

Why We're Sharing This

Most "AI case studies" are demos that never make it to production. Chatgpt screenshots with perfect prompts that don't survive the real world. Startups selling vaporware with made-up metrics.

This is different because the product is downstream from the system we use every day to run a real company. The numbers are auditable. The failures are documented. The dashboard is public.

If you run a beauty, home, food, or any DTC/retail brand, and your team is growing faster than your operational capacity: this works. It's not magic. It's six months of debugging, 32 lessons learned the hard way, and a brain that keeps getting smarter.

Whoever stops, loses.

Start with the Playbook. Free.

32 Chapters. 32 Production lessons. 15+ Advanced capabilities. Free online. No email required.

read the playbook → see it live

How we built our company a brain.And what changed afterwards.

Context: Why We Started

What We Learned

1. Each Agent Solves One Problem, not "everything"

2. Treat Them Like Employees, Not Software

3. The Brain Came First. Everything Else Followed.

4. Expensive Models Aren't Always the Best Ones

5. Agents Live Where Your Team Talks

The Agents

The Cost

What Broke

What Comes Next

The Playbook Is Free and Updates Daily

Pattern Library: The Network Effect

The Implementation Kit

Why We're Sharing This

Start with the Playbook. Free.

How we built our company a brain.
And what changed afterwards.