Three Layers
Somewhere in the last few months, the architecture settled into three distinct layers. I didn't plan this separation — like most of the harness.os design, it came from trying to explain what I'd built to someone else.
The first layer is harness.os itself — the methodology. Universal principles that don't prescribe any specific tool or stack. Four knowledge types. Internal structure for each type: knowledge, rules, workflows, learnings. Session lifecycle. Knowledge flow patterns. Mesh connectivity. You could implement this with sticky notes on a wall if you wanted to. You'd be slower, but the principles would still apply.
The second layer is harness config — a specific application of the methodology. My config includes an 8-phase development workflow, TDD as a default, hexagonal architecture, and specific domains like skydiving, fitness, and finance. Someone else's config would look entirely different. A marketing team might have a 5-phase content workflow, A/B testing as a default, and domains like brand voice, audience segments, and campaign performance. Configs are portable. Forkable. You could take mine, strip out the domains, keep the dev workflow, and have a working starting point in an afternoon.
The third layer is the mesh — a running instance. My mesh is six apps connected through their harnesses, with knowledge flowing between them. Your mesh would be your apps, your connections, your knowledge. Apps are on a mesh, not the mesh. The mesh is the running network. It's what makes cross-domain queries possible — asking "can I afford this skydive camp?" and having the system check both financial data and scheduling data across different apps.
Methodology is universal. Config is personal. Mesh is operational. Change any layer without breaking the others.
Scale Tiers
The methodology should work at every scale. If it only works with expensive infrastructure, it's not a methodology — it's a product. So I mapped out four tiers, each with real economics.
Tier 1: Files
CLAUDE.md + rules files. One person, 1–3 projects. This is where most people working with AI are today, whether they know it or not. If you've ever edited a system prompt or saved a set of instructions for your AI tool, you're at Tier 1.
Tier 2: Database + MCP
PostgreSQL + MCP server per harness. One person, 3–10 projects. Structured knowledge, queryable rules, persistent learnings. This is where I am. The jump from files to database is where the compounding starts.
Tier 3: Remote MCP
Hosted databases, authentication, team access. 2–20 people sharing harness knowledge. The config becomes a team artifact. The mesh becomes collaborative.
Tier 4: Federated Mesh
Cross-organization, cross-mesh learning. Enterprise scale. Knowledge flows between teams, between departments, between partner organizations. This tier is theoretical for me — I haven't built it. But the architecture supports it.
The methodology stays the same at every tier. Four types, same internal structure, same session lifecycle. Only the implementation changes. A Tier 1 user organizing markdown files is doing the same conceptual work as a Tier 4 enterprise running federated databases.
Teaching It
The real test of a methodology is whether you can teach it to someone who didn't invent it.
I'm running workshops at my company — a legal tech firm that automates wills, trusts, and powers of attorney. The domain couldn't be more different from skydiving apps, which is exactly the point. If harness.os only works for my personal projects, it's a workflow, not a methodology.
The workshop progression:
- Workshop 1 (done): Inner harness basics. How AI tools actually work. What a context window is. Why prompts degrade over long sessions. This is foundation-setting — most people have never thought about why their AI conversations go sideways after 30 messages.
- Workshop 2 (in progress): Outer harness concepts. Why the knowledge layer matters more than the tools. The four types applied to legal tech: build (how we develop software), product (what we're building), operations (how wills and trusts actually work), domain (client data, case records). People start seeing the categories once they have the framework.
- Workshop 3 (planned): Live demo. I show my actual harness — all six apps, the mesh, the MCP connections, real queries flowing through real knowledge. This is where it becomes concrete instead of abstract.
- Workshop 4+: The team builds their own outer harness together. Starting at Tier 1 with files, with a path to Tier 2 if it sticks.
The end goal: company-wide harness.os adoption, from the dev team through to the legal operations team. Every department managing their AI knowledge through the same four-type structure, with configs tailored to their work.
I'll be honest: it's early. The first workshop landed well. The second is generating good questions. But we haven't shipped anything with it yet. Teaching a framework is not the same as proving a framework.
The Honest Scorecard
Validated
- Architecture runs real products (6 apps, 2 cortex.ai tenants)
- MCP bet was correct (now an industry standard, 97M+ downloads)
- Four types held across 18 harness instances
- Outer harness > inner harness is proven in practice
- Three-layer separation (methodology / config / mesh) is clean
Unproven
- Compound learning across meshes (no metrics pipeline yet)
- Economics at scale (only 2 tenants paying)
- Methodology portability (no second practitioner yet)
- Relational/governance knowledge gaps at scale
- Team adoption beyond workshops
A methodology should be teachable, repeatable, and produce similar results for different practitioners. harness.os has only been used by its creator. Until someone else follows it and succeeds, calling it a methodology is aspirational. I know that. The architecture works, the concepts are sound, and I'd rather share something honest than wait for perfection.
What's Different Here
I've looked at what's out there. CrewAI, LangChain, AutoGen, Semantic Kernel — none of them separate the knowledge layer from the execution layer the way harness.os does. None have a type system for knowledge. None make configs portable and forkable. None distinguish methodology from implementation from running instance.
The three-layer separation, the four-type knowledge system, configs as portable AI strategies — I haven't seen these elsewhere. Whether "different" means "useful to others" is a question I can't answer alone.
Where the Industry Is Heading
The ideas behind harness.os are not contrarian. The industry is moving in the same direction.
KPMG calls it "knowledge engineering" and published The Knowledge Engineering Imperative. MCP's adoption curve (97 million downloads and counting) validates the protocol choice. Every major AI lab is investing in tool use, persistent memory, and multi-agent orchestration.
The question isn't whether organized AI knowledge matters. Everyone's figuring that out. The question is whether a solo developer's methodology has anything to offer that the big platforms won't build on their own. I don't know the answer to that yet.
The Point
I started with a Claude subscription and three projects. I found that AI tools need organized knowledge to be useful beyond surface-level help. I built the knowledge layer. I organized it into four types. I separated the methodology from the implementation from the running instance. And now I'm teaching it to others.
Whether harness.os becomes a widely used framework or just something that helped a few teams work better with AI — either way, the knowledge persists. The four types still hold. The three-layer separation still works. The scale tiers still provide an on-ramp from a markdown file to a federated mesh.
That was the point. Not a platform. Not revenue. Just organized, structured knowledge that makes AI tools better at helping you do your work.
Eight posts. The technical deep dive is in the full harness.os documentation. If any of this is useful to you, start with Tier 1. Open a CLAUDE.md file. Write down what your AI needs to know. Organize it into four types. See what happens.
In my experience, once you give AI structured knowledge, the results improve noticeably.
AI Knowledge Engineering as a Practice
AI Engineering is well-defined: prompt engineering, agent frameworks, model deployment, tool-use orchestration. Courses exist. Job titles exist. A body of knowledge exists. It's the practice of building the execution engine — the inner harness.
AI Knowledge Engineering barely has a name. No courses. No job titles. No established body of knowledge. But it's the practice of organizing the context that makes AI tools actually useful in specific domains.
The other engineering fields aren't going anywhere — ML engineering is pushing into edge deployment and multimodal, software engineering is evolving with AI-assisted development, data engineering keeps scaling. They're all still essential, still innovating. But none of them focus on organizing the domain knowledge that makes AI tools effective in a specific context. That's the gap. Not because the other fields are weak, but because this layer didn't exist until AI agents needed structured knowledge to operate.
KPMG published "The Knowledge Engineering Imperative" in early 2026. Martin Fowler wrote about "harness engineering" — the outer harness where knowledge lives. Meta built a swarm of 50+ agents to extract and organize tribal knowledge. Anthropic published "Effective Context Engineering for AI Agents". The concept is showing up everywhere under different names.
I've been calling it AI Knowledge Engineering: the practice of organizing, structuring, and curating knowledge for AI agents. Four types of knowledge. Three layers of implementation. Scale tiers from files to federated mesh. Cross-cutting concerns that span all types.
harness.os is my attempt to define this practice. It's early. It's personal. But the pattern has held up in my work, and others seem to be arriving at similar ideas from different directions.
The inner harness keeps getting better on its own — that's what AI companies do. The outer harness is where your domain expertise lives. AI Knowledge Engineering is about making that expertise accessible to AI.
This Blog as a Case Study
I mentioned at the beginning of this series that the blog itself would serve as an example. Here's what it drew from.
- way2fly — full mobile app for skydive progression (Flutter, iOS + Android)
- way2move — training and wellness tracker (Flutter, iOS + Android)
- way2save — personal finance manager (Flutter + Firebase backend)
- build.ai — full-stack AI orchestration platform (React, Express, WebSocket, Neon Postgres, 37 tables)
- marco.ai — personal life management assistant hub
- cortex.ai — multi-tenant SaaS for AI-powered business processes
- harness-os-mcp — MCP server with 39 tools powering the entire mesh
- This blog — 8-part series with full historic research, git data from 1,375+ commits across 11 repos in 6 weeks, design evolution, and framework analysis against 11 established methodologies
One developer. All of it. I don't think I could have managed this many projects without the knowledge layer. Not because of faster coding — that helps — but because the harness carries context that would otherwise live only in my head and get lost between projects.
This blog was created using two layers, and the difference between them is what this series is about.
AI Engineering built it. Claude Code agents — the inner harness — wrote, styled, and assembled these 8 posts. Multiple agents ran in parallel. The connector handled the model calls, the tool routing, the file writes. That's AI Engineering: plugging in a capable runtime.
AI Knowledge Engineering shaped it. The outer harness is why the content is accurate and consistent. The agents had access to: real git history across 11 repositories and 1,375+ commits. Architecture decisions logged over months. Process definitions, rules, and workflows in CLAUDE.md files, .claude/rules/ directories, and CNS databases. Workshop plans, scale tier definitions, framework stress-test results.
Without the knowledge layer, the agents would have produced generic content — hallucinated dates, invented project names, plausible-sounding but inaccurate stories. The structured knowledge accumulated since the first CLAUDE.md file is what kept the output grounded in what actually happened.
AI Knowledge Engineering is still a new term. The boundaries aren't clear yet, and the best practices will take time to emerge. But the harness.os methodology provides a starting point: four types of knowledge, three layers of implementation, scale tiers from a markdown file to a federated mesh.
I've been doing this since day one. Not because I planned to invent a methodology, but because I had three projects and needed the AI to remember what I'd already decided. Everything since then has been conceptualizing what I was already doing, understanding its scales, and giving it a name.
Knowledge compounding in practice
Here's what I've observed across projects:
Without AI: one developer builds one product. Maybe. It takes months. You need a team for anything serious.
With AI but no knowledge layer: one developer builds faster, but quality degrades across projects. Every session starts from zero. The AI suggests Bloc when you use Riverpod. It mixes up your skydiving domain with your finance domain. You spend half your time re-explaining. You might manage 1–2 products, painfully.
With a knowledge layer: one developer manages 6+ products, and quality improves over time. Each product's harness feeds the next. The build harness that learned patterns from way2fly makes way2save faster. The operations harness structure that works for skydiving works for hospitality. The dev workflow that got refined over 1,375+ commits applies to every new project on day one.
Product #7 takes less setup than product #6. Your build harness already knows your architecture. Your dev workflow is already structured. Your coding standards are already codified. The effort for each new product drops because the knowledge layer grows with each one you ship.
This is also what I'm betting on with cortex.ai. Each tenant gets structured operations and domain harnesses. The effort to onboard tenant #10 should be less than tenant #1, because the templates and cross-industry patterns carry forward. That's the theory — with only 2 tenants so far, it's still early.