Building the Knowledge API | Continuous Improvement on harness.os

The harness becomes the shared brain that any interface plugs into.

From Theory to Implementation

The previous post designed the architecture: knowledge chunks in a database, an MCP tool to query them, concern-based routing instead of loading everything. This post documents building it. Not a future plan — the actual implementation, done in the same session as the design.

That's not an accident. The harness enforces this loop: improve, log the improvement, write about what you built. Design and implementation happen together because the system won't let you leave a design sitting on a shelf. If you designed it, you build it. If you built it, you document it. Same session.

Everything described here is live and queryable right now: database tables created, 18 knowledge chunks migrated and stored, MCP tool registered in the harness server, and any agent that connects to the harness can call it. Here's how it came together.

The Database

Two tables. One holds the knowledge, the other tracks how it gets used.

knowledge_chunks
  slug             text PRIMARY KEY    -- e.g. "flutter-testing" or "way2save-domain"
  title            text NOT NULL       -- human-readable: "Flutter Testing Conventions"
  content          text NOT NULL       -- the actual rules, concise prose
  concerns         text[]             -- ["testing"], ["domain-logic", "multicurrency"]
  project_scope    text[]             -- ["way2save"] or NULL for shared chunks
  token_estimate   integer            -- cost tracking: 180, 220, etc.
  version          integer DEFAULT 1  -- bump on updates for cache invalidation
  created_at       timestamptz
  updated_at       timestamptz

knowledge_usage
  id               serial PRIMARY KEY
  session_id       uuid               -- links to claude_sessions
  chunk_slug       text               -- which chunk was served
  served_at        timestamptz        -- when
  task_context     text               -- what the agent was doing

The schema is deliberately simple. No categories, no hierarchies, no tagging taxonomies. Chunks have concerns (what topics they cover) and project_scope (which projects they apply to). That's the entire routing model.

The key design choice: project_scope NULL means "shared across all projects." One copy of Flutter testing rules serves way2fly, way2move, and way2save. Project-specific chunks like way2save's multi-currency logic only load for way2save sessions. One source of truth. Zero drift.

The knowledge_usage table exists for Stage 4 — learned routing. Every time a chunk gets served to a session, we log it. Over time, that data shows which chunks are always served together (merge candidates), which are never used (removal candidates), and which correlate with successful outcomes (high-value knowledge). But that's future work. Right now it just records.

18 Chunks, Not 12 Files

The old system had 12 rule files across the three Flutter apps. Verbose, full of code examples the model already knows, duplicated between projects. The migration wasn't a copy-paste — it was a distillation.

Claude already knows Flutter patterns. It knows how to write a Widget test, how to structure a Provider, how to configure Firebase. What it doesn't know is our conventions — the specific choices we've made about how those patterns get applied in our codebase. So the chunks strip the tutorials and keep the decisions.

9 shared chunks

Chunk	Concerns	Tokens
flutter-architecture	architecture, flutter-ui	210
flutter-state-management	state-management, architecture	185
flutter-navigation	navigation, flutter-ui	140
flutter-testing	testing	195
firebase-data	firebase-data, backend	170
auth-conventions	auth, security	130
security-baseline	security	120
docker-ci	ci-cd, docker	145
workflow-conventions	workflow, git	110

6 app-specific chunks

Chunk	Scope	Concerns	Tokens
way2save-domain	way2save	domain-logic	180
way2save-multicurrency	way2save	domain-logic, multicurrency	220
way2save-provenance	way2save	domain-logic, data-integrity	155
way2fly-domain	way2fly	domain-logic	165
way2move-domain	way2move	domain-logic	150
cross-app-assistant-ingest	way2fly, way2move, way2save	assistant, data-ingest	185

The numbers

Metric	Before (files)	After (chunks)
Total knowledge stored	~5,000 tokens	2,660 tokens
Loaded per session (old: everything)	~5,000 tokens	—
Loaded per typical task (3–5 chunks)	—	~800–1,100 tokens
Effective reduction per session		~78%

The total stored knowledge is smaller because we stripped the noise. But the real win is that an agent never loads all 2,660 tokens — it loads only the 3–5 chunks relevant to its current task. A typical "fix a test in way2save" session loads around 1,100 tokens of knowledge instead of 5,000.

The MCP Tool

Two operations. One to query, one to browse.

get_knowledge(
  concerns = ["testing", "domain-logic"],
  project  = "way2save"
)
→ Returns matching chunks: content + metadata

list_knowledge()
→ Returns all available chunks: slug, title, concerns, token_estimate

get_knowledge does the routing. It takes the task's concerns and the project scope, queries the knowledge_chunks table for anything that matches, and returns the content. The query logic: return chunks where concerns overlap with the requested concerns AND either project_scope includes the requested project or project_scope IS NULL (shared chunks).

list_knowledge is for exploration. An agent that isn't sure what concerns to request can browse the full catalog with token estimates before deciding what to load.

Both tools are registered in the harness MCP server alongside the existing tools — start_session, log_learning, log_decision, and the rest. No separate server, no additional configuration. If you can talk to the harness, you can query the Knowledge API.

A real query

Task: "Fix a failing test in way2save's multi-currency module." The agent classifies the concerns as testing, domain-logic, and multicurrency, then queries:

get_knowledge(
  concerns = ["testing", "domain-logic", "multicurrency"],
  project  = "way2save"
)

→ 7 chunks returned:
  flutter-testing           (shared)    195 tokens
  flutter-architecture      (shared)    210 tokens
  flutter-state-management  (shared)    185 tokens
  way2save-domain           (way2save)  180 tokens
  way2save-multicurrency    (way2save)  220 tokens
  way2save-provenance       (way2save)  155 tokens
  security-baseline         (shared)    120 tokens
                                       ___________
  Total:                               ~1,110 tokens

1,110 tokens vs. 5,000 from the old file-based system. The agent gets exactly the conventions it needs for this specific task — testing patterns, the way2save domain model, multi-currency rules, provenance tracking — and nothing it doesn't.

Any interface that connects to the harness gets the Knowledge API automatically. Claude Code sessions, build.ai agents, cortex.ai tenants — they all call the same MCP tool and get the same knowledge. No per-tool configuration. No syncing. One API, every consumer.

The build.ai Connection

Here's the insight that makes this more than a developer convenience. The Knowledge API works exactly like build.ai's request pipeline. Same pattern. Same knowledge source. Same precision.

build.ai

Request comes in → classify (business type, request type) → select process → spawn agents → each agent gets only relevant context

↓

Knowledge API

Task comes in → classify concerns → query knowledge_chunks → agent gets only relevant rules

A build.ai agent running a QA task calls get_knowledge(["testing"]) and gets the same testing conventions as a Claude Code session doing the same work. A cortex.ai tenant agent handling a legal-tech question calls get_knowledge(["domain-logic"], project="cortex-aluminex") and gets that tenant's specific rules. The knowledge doesn't care which interface is asking. It cares about what the task needs.

Same pattern. Same knowledge source. Same precision.

This is why the Knowledge API isn't just a developer tool. It's infrastructure. It's the shared brain that build.ai, cortex.ai, Claude Code, and any future interface all plug into. Update a testing convention once, and every agent across every interface picks it up on their next query. No deployment. No sync. No drift.

This is what makes harness.os an operating system, not a config file. Config files are local, static, copied. An operating system provides services that any process can call. The Knowledge API is a service. Any agent is a process. The harness serves them all from one source of truth.

What's Next: Learned Routing

The knowledge_usage table is already recording every chunk served to every session. That data is the foundation for Stage 4: the system that optimizes its own knowledge delivery.

The questions it will answer:

Which chunks are always served together? If flutter-testing and flutter-architecture appear in the same query 90% of the time, merge them into one chunk. Fewer queries, fewer round trips.
Which chunks are never used? If a chunk hasn't been served in 30 days, it's dead weight. Archive it. Keep the catalog lean.
Which tasks produce errors after receiving certain chunks? If sessions that load security-baseline for auth tasks consistently produce bugs, the chunk's content might be outdated or misleading. Flag it for review.
Which concern combinations are common enough to pre-bundle? If "testing + domain-logic" is the most frequent query pattern, build a composite chunk that combines them — one query instead of loading five separate pieces.

None of this requires new infrastructure. The tables are there. The logging is running. Stage 4 is an analysis layer on top of data we're already collecting. When the usage data reaches critical mass, the patterns will emerge on their own.

A system that watches how its own knowledge gets used, prunes what doesn't work, and reinforces what does. That's not configuration management. That's a knowledge system that learns.