I spent 6 days building 30 AI plugins that run my entire business. Email triage, CRM sync, invoice processing, proposal writing, meeting prep, content creation, all automated. Not by coding faster, but by building a system that builds plugins.
This is the story of Founder OS: the AI chief of staff that lives inside Claude Code. What worked, what I threw away, and why the boring bugs were the real ones.
Key Takeaways
- 30 AI plugins built in 6 days by one founder using Claude Code and a repeatable factory system
- 82+ slash commands across 32 namespaces covering email, CRM, invoices, proposals, content, and more
- 96.4% first-pass UAT test rate, with failures caused by database naming bugs, not AI hallucinations
- Four agent team patterns (Pipeline, Parallel Gathering, Pipeline+Batch, Competing Hypotheses) powered every plugin
- One command to install:
npx founder-os@latest --init
Why Would Anyone Need 30 AI Plugins?
41% of solopreneurs report time management as their biggest challenge, more than any other issue (Founder Reports, 2026). I was one of them. Every hour I spent triaging email, updating my CRM, chasing follow-ups, or formatting invoices was an hour I wasn't building the product.
The realization hit me while building a set of LinkedIn automation agents. I had a lead enrichment agent, an ice-breaker agent, a contact finder. Four stages. Four tools. And then I thought: founders don't need one AI agent. They need a suite.
So I set out to build one. Not a single tool, but an entire operating system. Thirty plugins across four pillars:
- Daily Work: email triage, calendar briefings, meeting prep, follow-ups, weekly reviews
- Code Without Coding: reports, invoices, proposals, SOWs, contracts, competitive intel
- MCP & Integrations: Notion, Google Drive, Slack, CRM sync, knowledge base
- Meta & Growth: time savings tracking, prompt library, workflow automation, goal tracking
Why not just use Claude Code directly? Because raw Claude Code is a blank terminal. Founder OS adds 82 pre-built commands with domain knowledge: email patterns, invoice schemas, CRM logic, meeting prep frameworks. It's the difference between a text editor and an IDE.
Only 17% of a solo founder's time goes to actual product work. The rest is operational overhead. That's what I set out to automate.
What Does "Build the Factory, Not the Product" Actually Mean?
Businesses using AI report saving 20-120 hours per employee per year on repetitive tasks alone (Thryv, 2026). But I didn't save that time by building 30 individual tools. I saved it by building a system that builds tools, then running it 30 times.
Here's the distinction that changed everything: I didn't write 30 plugins. I wrote a plugin factory. The /build-plugin command and a set of scaffold templates turned plugin creation into a repeatable 6-step process.
The Plugin Lifecycle
Every single plugin followed this exact sequence:
- Fetch spec from Notion: each plugin had a full specification page in my project database
- Explore codebase patterns: examine templates, infrastructure, existing conventions
- Architecture design: pick the right agent team pattern (more on this below)
- Build: two modes, a fast single-agent default and a
--teamflag for the full agent pipeline - Validate: integration tests covering happy path, edge cases, and graceful degradation
- Document: README, INSTALL, QUICKSTART, plugin.json manifest
The /plan-plugin and /build-plugin commands made this process mechanical. The 24th plugin shipped as smoothly as the 2nd. No ramp-up time. No "let me figure out how this works again." Just: spec in, plugin out.
Each plugin lands in a clean structure: .claude-plugin/plugin.json for the manifest, commands/ for slash commands, skills/ for domain knowledge, and agents/ for team definitions. Everything is markdown, no complex APIs, no build process. You write markdown and Claude understands it.
That's the factory pattern. You don't get faster by typing more code. You get faster by making the process so repeatable that speed becomes a side effect.
How Did 4 Agent Patterns Power 30 Different Plugins?
73% of engineering teams now use AI coding tools daily, up from 41% in 2025 (Pragmatic Engineer, 2026). But most teams use them for code completion, not orchestration. Every Founder OS plugin uses one of four agent team patterns. Picking the right pattern was 80% of the architecture work.
Pipeline: agents work in sequence. One finishes, the next starts. Inbox Zero uses this. Triage Agent scans your inbox, Action Agent extracts tasks, Response Agent drafts replies, Archive Agent files everything. Input flows through the chain like an assembly line.
Parallel Gathering: all agents fetch data simultaneously, then a lead agent merges the results. Daily Briefing works this way. Gmail, Calendar, Notion, and Slack agents all fire at once. The Synthesis Agent combines everything into a single morning summary. Much faster than sequential.
Pipeline + Batch: the pipeline pattern, but run once per item in a batch. Invoice Processor handles a folder of PDFs this way. Each invoice goes through the extraction pipeline independently, then a batch agent generates the summary table across all of them.
Competing Hypotheses: multiple agents propose different solutions, then a lead agent picks the best elements from each. SOW Generator uses this. Three agents draft scope options (conservative, balanced, ambitious), then the Synthesis Agent combines the strongest elements into a final Statement of Work.
Why only four patterns? Because constraints breed speed. When you sit down to architect a new plugin, you don't start from scratch. You ask: "Is this sequential, parallel, batch, or multi-perspective?" That question gets you 80% of the way there. The remaining 20% is domain logic, the skills and commands specific to that plugin's job.
What Went Wrong on Day 1?
I threw away everything I built on the first day. All of it. And it was the best decision of the entire project.
My initial approach was to build all 30 plugins simultaneously. Spawn parallel sub-agents. One for each plugin. Architecturally elegant. A sophisticated multi-phase workflow with infrastructure blocking all work, phases running in sequence, each plugin following a 7-task chain. It was ambitious and complex. And completely wrong.
The result? Tangled, untestable code. Plugins that couldn't be validated independently. An architecture that looked clean on paper but fell apart the moment you tried to run any single piece of it.
So I reset. Commit e20d0a4: "Fresh start - clean up plugins and backlog, build one by one." All implementation code? Deleted. All backlog items? Cleared. Only the templates and infrastructure survived.
Our finding: Over-engineering isn't about complexity. It's about building before you've validated the pattern. The one-at-a-time approach shipped 30x faster than the "elegant" parallel approach because each plugin could be tested, fixed, and documented before moving to the next.
The new strategy was embarrassingly simple: build one plugin. Make sure it works. Build the next one using the same process. Repeat 29 times. The first plugin (Inbox Zero) established the lifecycle. Every plugin after that was a refinement of the same six steps.
Isn't it tempting to build everything at once? Always. But validation comes before optimization. You can't optimize a process you haven't proven works.

How Do You Test 30 AI Plugins?
96.4% of Founder OS plugins passed UAT on the first run. Pillar 1 (Daily Work): 53 out of 56 tests passed, zero failures (internal UAT results). The 3.6% that didn't pass? Database naming inconsistencies. Not AI hallucinations. Not logic errors. Just database naming. The boring bugs are always the real ones.
67% of developers say they'd trust AI-generated tests, but only with human review (Rainforest QA, 2025). That matched my experience exactly. The AI built the plugins fast. Catching the edge cases still required a human running the tests.
Every plugin ran through seven test categories:
- Happy Path: does the core functionality work end-to-end?
- Database Discovery: can it find the right Notion database using the three-step search pattern?
- Type Consistency: does it write the correct Type value to shared databases?
- Idempotency: does running it twice produce the same result, not duplicates?
- Company Relations: does it properly link records to the CRM Companies hub?
- Graceful Degradation: does it handle missing optional services without crashing?
- Edge Cases: what happens with vague inputs, long transcripts, or empty data?
The 7 Failures That Made Everything Better
All seven initial failures in Pillar 2 came from the same root cause: database discovery code searched for the wrong naming pattern. Seventeen files across four plugins referenced "Founder OS HQ - Briefings" when the actual database was named [FOS] Briefings. Semantic search partially compensated, but the inconsistency flagged a systemic issue.
The fix wasn't glamorous. Find and replace across 17 files. Update the discovery pattern to search [FOS] X first, then fall back to legacy names. After the fix: 55 out of 56 tests passed, zero failures, one skipped.
62% of SMEs that adopted AI tools report significant productivity improvements within six months (McKinsey, 2025). But that only happens when the tools actually work reliably. Testing isn't the exciting part. It's the part that makes everything else possible.
Why Did 33 Plugins Become 1?
Monorepo consolidation reduces dependency conflicts by up to 60% compared to multi-repo setups (Spectro Cloud, 2026). I discovered the same principle applied to plugin architecture, just at a different scale.
33 separate plugin directories. Each with its own manifest, its own .mcp.json, its own skills and commands. And Claude Code's plugin system ignored most of them.
Symlinked directories, the installation mechanism I'd built, were invisible to the plugin loader. The fix was radical: make the entire repository one plugin.
The single-plugin restructure moved everything under one root plugin.json. Commands got organized into namespace directories: commands/inbox/triage.md invoked as /founder-os:inbox:triage. Skills, agents, and templates followed the same pattern. 393 files reorganized. The result? Net -229 lines of code. The restructure actually reduced complexity.
But that was only half the problem. The 30 plugins had created 32 separate Notion databases. Users would need a database to manage their databases. Not great.
The Notion consolidation was just as dramatic. 32 databases became 22 interconnected ones under a single "Founder OS HQ" workspace. The trick? A Type column.
Instead of separate databases for Email Tasks, Action Items, and Follow-Ups, there's one [FOS] Tasks database with a Type field. Email triage writes Type = "Email Task". Action item extraction writes Type = "Action Item". Same database, different views.
Companies as the central CRM hub. Every client-facing database links back to it. Hub-and-spoke. Simple, queryable, and zero chance of orphaned records floating in forgotten databases.
Future additions now require only creating a new namespace directory. No manifest updates. No installation scripts. The architecture got out of the way.
What Can You Actually Do With Founder OS Today?
Claude Code is the most loved AI coding tool at 46%, ahead of Cursor at 19% and GitHub Copilot at 9% (Pragmatic Engineer, 2026). Founder OS turns Claude Code from a coding assistant into a full business operating system. One command to install. 82+ commands ready to go.
Here's what that looks like in practice:
/founder-os:inbox:triage: scans your Gmail, categorizes emails by urgency, extracts action items, and drafts responses/founder-os:prep:today: generates meeting prep documents for every meeting on today's calendar/founder-os:crm:sync-email: matches email threads to CRM contacts and logs them automatically/founder-os:invoice:batch: processes a folder of invoices, extracts line items, and generates a summary table/founder-os:sow:generate: creates a Statement of Work with three scope options from a project brief/founder-os:briefing:briefing: pulls data from Gmail, Calendar, Notion, and Slack into a single morning briefing/founder-os:engine:plan: generates a monthly content plan with keyword research and social cascade
Every command works in two modes: a fast single-agent default for daily use, and a --team flag that activates the full multi-agent pipeline for deeper analysis. You choose the speed-quality tradeoff.
It's free. Open source. MIT license. Install it:
npx founder-os@latest --initSub-5-second install. Idempotent, safe to run again when updates ship. No runtime dependencies beyond Claude Code itself.
Frequently Asked Questions
Do I need to know how to code to use Founder OS?
No. Every command uses natural language through Claude Code's slash command interface. Type /founder-os:inbox:triage and it runs. Claude Code handles all the execution. 68% of US small businesses now use AI regularly (ColorWhistle, 2026), and most of them aren't writing code.
How is this different from using Claude Code directly?
Founder OS adds 82 pre-built commands across 33 namespaces with domain-specific knowledge: email triage patterns, invoice extraction schemas, CRM sync logic, meeting prep frameworks. Raw Claude Code is powerful but generic. Founder OS is trained for the specific workflows solo founders run every day. It's the difference between a blank terminal and a chief of staff who knows your business.
Can I build my own plugins on top of Founder OS?
Yes. The scaffold templates and /build-plugin workflow are included. Create a namespace directory under commands/, add your skill files, and start building. The same factory system that built the original 30 plugins is available to extend them. Everything is markdown, no complex APIs.
What external tools does Founder OS connect to?
Seven MCP integrations: Notion CLI (21 namespaces), gws CLI for Gmail, Calendar, and Drive (20 namespaces), Filesystem (8), Slack (2), and WebSearch (1). Only Notion is strongly recommended. Everything else degrades gracefully. If a service isn't configured, the command still runs with reduced functionality instead of failing.
Lukas Halicki is a software architect with over 15 years of experience in enterprise systems, including senior leadership roles at Swiss banking institutions. He built Founder OS to solve the operational overhead he experienced firsthand as a solo founder, and now helps other founders automate their businesses through NaluForge.
The Real Lesson
Don't build 30 things. Build a system that builds 30 things.
Speed doesn't come from coding faster. It comes from making the process so repeatable that every iteration is predictable. A factory, not a workshop.
And when things break, and they will, the failures will be boring. Database naming. Discovery patterns. Schema mismatches. Not the dramatic AI failures you read about. The mundane ones that only surface when you actually test everything.
30 plugins. 82+ commands. 4 agent patterns. 6 days. One install command:
npx founder-os@latest --initOr if you want someone to build custom AI automations for your business with the same approach and speed, book a call with NaluForge.
Next week: a deep dive into what each of those 82+ commands actually does, and how they work together as a system.
