← Back to platform docs
Part 5 of 8 May 2026 Building harness.os 11 min read

The Outer Harness Always Wins

I kept making my custom agent smaller and smaller — until I realized the agent isn't the point

The agent that kept shrinking

I built my first custom agent from scratch. Raw Anthropic API calls. Hand-crafted system prompts. A bespoke tool-use framework with JSON schema definitions for every operation. An orchestration layer that managed multi-step reasoning chains. Error recovery. Session persistence. The works.

It was a lot of code. Hundreds of lines of orchestration. Custom retry logic. A homegrown context-window manager that would summarize old messages when the conversation got too long. I had tool definitions for reading files, writing files, running shell commands, querying databases. Every tool was hand-wired with its own validation, error handling, and output formatting.

It was also an absolute nightmare to maintain.

Every time Anthropic shipped a new model version, something subtle would break. Every time I wanted to add a capability, I had to write the tool definition, the execution logic, the error handling, and the tests. I was spending more time maintaining the agent infrastructure than using it to build anything.

Then I tried Claude Code.


The uncomfortable comparison

Claude Code is Anthropic's CLI agent. You point it at a codebase, give it a task, and it figures out the rest. It reads files. It writes files. It runs commands. It reasons through multi-step problems. It handles errors gracefully. It manages its own context window.

Sound familiar? It did everything I'd hand-built, but better, and I didn't have to maintain any of it.

I started running the same tasks through both systems. My custom agent and Claude Code, side by side. And the same thing kept showing up: Claude Code was better at the mechanics. Better at file operations. Better at multi-step reasoning. Better at recovering from mistakes. It had a team of engineers at Anthropic optimizing the exact problems I was solving alone in my spare time.

So I started removing code.

First, I ripped out my custom file-reading tools. Claude Code already does that. Then the shell execution layer. Claude Code does that too. Then the session management. Then the context-window optimization. Then the error recovery logic.

My "agent" went from a complex orchestration system to a thinner orchestration system, to a wrapper, to — honestly — a script that assembled a prompt and called Claude Code with the right context.

Every time I compared "code I wrote to make the agent work" vs. "code I wrote to give the agent knowledge," only the knowledge code was actually mine. The agent infrastructure was work I was duplicating — and existing tools already did it better.


The pattern across tools

I tested GitHub Copilot. I tested Cursor. I tested other API-based agents people were building. And I noticed the same thing with all of them.

Every one of these tools is capable. And every one of them starts each session knowing nothing about you.

Claude Code can reason its way through any codebase — but it doesn't know that your team uses hexagonal architecture, that your CI pipeline requires specific test patterns, that you learned last month that a particular database migration strategy causes downtime. It figures it out every single time, from first principles, burning tokens and time on knowledge you've already earned.

Copilot can autocomplete anything — but it doesn't know your domain rules, your business logic, the decisions you made six months ago and why. It sees code. It doesn't see the thinking behind the code.

Custom API agents can do whatever you wire them to do — but you're spending all your energy on the wiring instead of on what flows through the wires.

The inner harness — the thin runtime connector — is a solved problem. Not perfectly solved. Not finished. But solved well enough that building your own from scratch is like writing your own web server in 2026. You can do it. You'll learn a lot. You shouldn't ship it.

The outer harness — the full intelligence layer — is the unsolved problem. And it's the only thing that was actually mine.


Inner vs. outer

Let me make this concrete. The inner harness is the thin runtime connector: the model API call, the tool routing, the message transport. It's the minimum interface needed to execute. Deliberately thin so it's trivially swappable.

The outer harness is the full intelligence: the rules, the workflows, the learnings, the domain expertise, the architectural decisions, the process definitions — everything that makes the agent smart and domain-specific. Not just what it knows, but how work should be done.

Every custom agent I'd built was 90% inner harness and 10% outer harness. I had the ratio exactly backwards. The valuable part was the 10% — the cortex files, the spine configs, the database-backed knowledge chunks I'd been building since the early days. That was the differentiation. That was the thing no off-the-shelf tool could provide.


A practice without a name

Everyone talks about AI Engineering — building the execution engine, prompt engineering, agent frameworks, model fine-tuning. That's the inner harness. It's important and it's becoming more standardized every month.

What almost nobody is naming is the other side: AI Knowledge Engineering — the practice of organizing, structuring, and maintaining the knowledge that makes AI agents effective. The outer harness.

Knowledge engineering existed in the 1980s expert systems era, but it was about encoding rules into brittle IF-THEN systems. The modern version is different: structuring knowledge for probabilistic AI agents that can reason with it, learn from it, and improve from it.

AI Engineering is about making agents smarter. AI Knowledge Engineering is about making the knowledge they work with smarter.

The first is a moving target that improves every quarter. The second is a compounding asset that gets more valuable over time.

My experience is that AI Knowledge Engineering deserves more attention than it gets. The inner harness will keep getting better on its own. The outer harness — the knowledge layer — is where human expertise actually matters.


The industry is proving this

This isn't just my experience. The broader ecosystem is moving in the same direction.

MCP — the Model Context Protocol — has crossed 97 million monthly SDK downloads. What MCP actually is: a standard interface between inner and outer harness. Any MCP-compatible client (Claude Code, Copilot, custom agents) can connect to any MCP server (your knowledge, your tools, your data). The protocol standardizes the connection. The value is in what's connected to.

The inner harness is becoming standardized. The outer harness is where the knowledge lives. MCP is the plug that connects them.

The replacement test: if Claude Code disappears tomorrow, my outer harness — every knowledge chunk, every rule, every learning, every workflow — survives intact. I connect a different execution engine via MCP and it picks up where the previous one left off. The knowledge outlives the model, the agent, and the company that built the agent.


What this means for what I was building

Once I saw this clearly, everything I'd built fit into a simpler frame.

The cortex files from the early days? Outer harness. The spine configs that structured each project's knowledge? Outer harness. The database-backed knowledge system with domains and chunks and rules? Outer harness. The learnings that accumulated across sessions? Outer harness.

All the custom agent code, the orchestration logic, the tool definitions, the session management? Inner harness. Replaceable. And I'd already replaced most of it with Claude Code calls.

I wasn't building an agent platform. I wasn't building a CLI tool. I wasn't building a SaaS. I was building a knowledge layer — an outer harness that any execution engine could plug into.

The inner harness always gets replaced. Models improve. Agents get better. New tools ship. That's the nature of the execution layer — it's a moving target that moves in one direction: better.

The outer harness always wins because knowledge compounds. Every rule you codify makes the next session smarter. Every learning you log prevents the next mistake. Every workflow you structure makes the next execution faster. The outer harness doesn't deprecate. It accumulates.


Data organization for AI

The simplest way I can describe what harness.os became: data organization for AI. That's AI Knowledge Engineering in four words. Structuring knowledge so that this new element in the development process can read it, use it, learn from it, and improve from it.

Humans have always organized knowledge for themselves — documentation, wikis, runbooks, playbooks. But we organized it for human consumption: narrative text, visual diagrams, conversational explanations. AI needs something different. Not harder. Not more complex. Just differently structured. Chunks instead of chapters. Rules instead of guidelines. Explicit relationships instead of implied ones.

The outer harness is the practice of organizing knowledge for both humans and AI simultaneously. And it turns out that organizing for AI often makes it better for humans too, because it forces you to be explicit about things you'd otherwise leave vague.


But knowledge for what?

At this point I understood what I was building. Not an agent platform. Not a CLI tool. Not a SaaS — a knowledge layer that any execution engine could plug into.

But what was the knowledge for? What categories existed? What was the taxonomy? I had cortex files and database chunks, but they were organized by project, not by purpose. Some chunks were about how to build software. Others were about what was being built. Others were about entirely different domains — skydiving operations, fitness programming, financial rules.

I needed a deeper organizing principle. A way to categorize not just the knowledge itself, but the types of knowledge that AI needs to participate effectively in any process.

That question led to the next piece of the puzzle.