The harness becomes the shared brain that any interface plugs into.
From Theory to Implementation
The previous post designed the architecture: knowledge chunks in a database, an MCP tool to query them, concern-based routing instead of loading everything. This post documents building it. Not a future plan — the actual implementation, done in the same session as the design.
That's not an accident. The harness enforces this loop: improve, log the improvement, write about what you built. Design and implementation happen together because the system won't let you leave a design sitting on a shelf. If you designed it, you build it. If you built it, you document it. Same session.
Everything described here is live and queryable right now: database tables created, 18 knowledge chunks migrated and stored, MCP tool registered in the harness server, and any agent that connects to the harness can call it. Here's how it came together.
The Database
Two tables. One holds the knowledge, the other tracks how it gets used.
knowledge_chunks
slug text PRIMARY KEY -- e.g. "flutter-testing" or "way2save-domain"
title text NOT NULL -- human-readable: "Flutter Testing Conventions"
content text NOT NULL -- the actual rules, concise prose
concerns text[] -- ["testing"], ["domain-logic", "multicurrency"]
project_scope text[] -- ["way2save"] or NULL for shared chunks
token_estimate integer -- cost tracking: 180, 220, etc.
version integer DEFAULT 1 -- bump on updates for cache invalidation
created_at timestamptz
updated_at timestamptz
knowledge_usage
id serial PRIMARY KEY
session_id uuid -- links to claude_sessions
chunk_slug text -- which chunk was served
served_at timestamptz -- when
task_context text -- what the agent was doing
The schema is deliberately simple. No categories, no hierarchies, no tagging taxonomies. Chunks have concerns (what topics they cover) and project_scope (which projects they apply to). That's the entire routing model.
The key design choice: project_scope NULL means "shared across all projects." One copy of Flutter testing rules serves way2fly, way2move, and way2save. Project-specific chunks like way2save's multi-currency logic only load for way2save sessions. One source of truth. Zero drift.
The knowledge_usage table exists for Stage 4 — learned routing. Every time a chunk gets served to a session, we log it. Over time, that data shows which chunks are always served together (merge candidates), which are never used (removal candidates), and which correlate with successful outcomes (high-value knowledge). But that's future work. Right now it just records.
18 Chunks, Not 12 Files
The old system had 12 rule files across the three Flutter apps. Verbose, full of code examples the model already knows, duplicated between projects. The migration wasn't a copy-paste — it was a distillation.
Claude already knows Flutter patterns. It knows how to write a Widget test, how to structure a Provider, how to configure Firebase. What it doesn't know is our conventions — the specific choices we've made about how those patterns get applied in our codebase. So the chunks strip the tutorials and keep the decisions.
9 shared chunks
| Chunk | Concerns | Tokens |
|---|---|---|
| flutter-architecture | architecture, flutter-ui | 210 |
| flutter-state-management | state-management, architecture | 185 |
| flutter-navigation | navigation, flutter-ui | 140 |
| flutter-testing | testing | 195 |
| firebase-data | firebase-data, backend | 170 |
| auth-conventions | auth, security | 130 |
| security-baseline | security | 120 |
| docker-ci | ci-cd, docker | 145 |
| workflow-conventions | workflow, git | 110 |
6 app-specific chunks
| Chunk | Scope | Concerns | Tokens |
|---|---|---|---|
| way2save-domain | way2save | domain-logic | 180 |
| way2save-multicurrency | way2save | domain-logic, multicurrency | 220 |
| way2save-provenance | way2save | domain-logic, data-integrity | 155 |
| way2fly-domain | way2fly | domain-logic | 165 |
| way2move-domain | way2move | domain-logic | 150 |
| cross-app-assistant-ingest | way2fly, way2move, way2save | assistant, data-ingest | 185 |
The numbers
| Metric | Before (files) | After (chunks) |
|---|---|---|
| Total knowledge stored | ~5,000 tokens | 2,660 tokens |
| Loaded per session (old: everything) | ~5,000 tokens | — |
| Loaded per typical task (3–5 chunks) | — | ~800–1,100 tokens |
| Effective reduction per session | ~78% |
The total stored knowledge is smaller because we stripped the noise. But the real win is that an agent never loads all 2,660 tokens — it loads only the 3–5 chunks relevant to its current task. A typical "fix a test in way2save" session loads around 1,100 tokens of knowledge instead of 5,000.
The MCP Tool
Two operations. One to query, one to browse.
get_knowledge(
concerns = ["testing", "domain-logic"],
project = "way2save"
)
→ Returns matching chunks: content + metadata
list_knowledge()
→ Returns all available chunks: slug, title, concerns, token_estimate
get_knowledge does the routing. It takes the task's concerns and the project scope, queries the knowledge_chunks table for anything that matches, and returns the content. The query logic: return chunks where concerns overlap with the requested concerns AND either project_scope includes the requested project or project_scope IS NULL (shared chunks).
list_knowledge is for exploration. An agent that isn't sure what concerns to request can browse the full catalog with token estimates before deciding what to load.
Both tools are registered in the harness MCP server alongside the existing tools — start_session, log_learning, log_decision, and the rest. No separate server, no additional configuration. If you can talk to the harness, you can query the Knowledge API.
A real query
Task: "Fix a failing test in way2save's multi-currency module." The agent classifies the concerns as testing, domain-logic, and multicurrency, then queries:
get_knowledge(
concerns = ["testing", "domain-logic", "multicurrency"],
project = "way2save"
)
→ 7 chunks returned:
flutter-testing (shared) 195 tokens
flutter-architecture (shared) 210 tokens
flutter-state-management (shared) 185 tokens
way2save-domain (way2save) 180 tokens
way2save-multicurrency (way2save) 220 tokens
way2save-provenance (way2save) 155 tokens
security-baseline (shared) 120 tokens
___________
Total: ~1,110 tokens
1,110 tokens vs. 5,000 from the old file-based system. The agent gets exactly the conventions it needs for this specific task — testing patterns, the way2save domain model, multi-currency rules, provenance tracking — and nothing it doesn't.
Any interface that connects to the harness gets the Knowledge API automatically. Claude Code sessions, build.ai agents, cortex.ai tenants — they all call the same MCP tool and get the same knowledge. No per-tool configuration. No syncing. One API, every consumer.
The build.ai Connection
Here's the insight that makes this more than a developer convenience. The Knowledge API works exactly like build.ai's request pipeline. Same pattern. Same knowledge source. Same precision.
A build.ai agent running a QA task calls get_knowledge(["testing"]) and gets the same testing conventions as a Claude Code session doing the same work. A cortex.ai tenant agent handling a legal-tech question calls get_knowledge(["domain-logic"], project="cortex-aluminex") and gets that tenant's specific rules. The knowledge doesn't care which interface is asking. It cares about what the task needs.
Same pattern. Same knowledge source. Same precision.
This is why the Knowledge API isn't just a developer tool. It's infrastructure. It's the shared brain that build.ai, cortex.ai, Claude Code, and any future interface all plug into. Update a testing convention once, and every agent across every interface picks it up on their next query. No deployment. No sync. No drift.
This is what makes harness.os an operating system, not a config file. Config files are local, static, copied. An operating system provides services that any process can call. The Knowledge API is a service. Any agent is a process. The harness serves them all from one source of truth.
What's Next: Learned Routing
The knowledge_usage table is already recording every chunk served to every session. That data is the foundation for Stage 4: the system that optimizes its own knowledge delivery.
The questions it will answer:
- Which chunks are always served together? If
flutter-testingandflutter-architectureappear in the same query 90% of the time, merge them into one chunk. Fewer queries, fewer round trips. - Which chunks are never used? If a chunk hasn't been served in 30 days, it's dead weight. Archive it. Keep the catalog lean.
- Which tasks produce errors after receiving certain chunks? If sessions that load
security-baselinefor auth tasks consistently produce bugs, the chunk's content might be outdated or misleading. Flag it for review. - Which concern combinations are common enough to pre-bundle? If "testing + domain-logic" is the most frequent query pattern, build a composite chunk that combines them — one query instead of loading five separate pieces.
None of this requires new infrastructure. The tables are there. The logging is running. Stage 4 is an analysis layer on top of data we're already collecting. When the usage data reaches critical mass, the patterns will emerge on their own.
A system that watches how its own knowledge gets used, prunes what doesn't work, and reinforces what does. That's not configuration management. That's a knowledge system that learns.