The Knowledge Precision Problem | Continuous Improvement on harness.os

Intelligence isn't knowing everything. It's knowing what matters right now.

The Firehose Problem

Every Claude Code session on way2save loaded 12 rule files. 1,234 lines of instructions, injected into the context window before a single line of work began. Navigation rules. Docker setup. Security patterns. Codemagic CI/CD configuration. Feature flag conventions. All of it, every time, regardless of what the task actually needed.

A simple test fix — change one assertion in one file — loaded every rule the project had ever accumulated. Navigation architecture? Irrelevant. Docker multi-stage build patterns? Irrelevant. OAuth security flows? Irrelevant. But they were all there, consuming tokens, diluting the context that actually mattered.

Scale that across a cross-app session touching all three Flutter apps — way2fly, way2move, way2save — and the numbers get ugly: 5,975 lines of rules loaded. Roughly 24,000 tokens. Most of them irrelevant to the task at hand.

This is like reading an entire encyclopedia to answer one question. The answer is in there somewhere, buried under thousands of pages you didn't need. The cost isn't just the reading — it's the dilution. The signal drowns in noise.

The Principle

The realization came from a simple question: what is the harness actually supposed to do? The answer reframed everything.

The harness should not add overhead — it should have ONLY the necessary knowledge to solve the task in the optimal way with quality.

This flips the mental model. The harness isn't a knowledge store — it's a knowledge router. Its job isn't to add context. Its job is to filter it. Serve precisely what's needed for this task, this app, this moment. Nothing more.

Every line of context you load that isn't relevant to the current task has a cost. It uses tokens. It competes for attention in the model's context window. It increases the chance that the model fixates on an irrelevant instruction instead of the relevant one. Context pollution is real, and we were doing it to ourselves.

The job isn't to ADD context — it's to FILTER it. A knowledge router that serves three precise rules beats a knowledge dump that serves twelve generic ones. Every time.

The Fix: Task-Based Routing

The fix was structural, not incremental. We didn't trim the rule files or make them shorter. We changed when they load.

Before: Force-Load Everything

Each app's CLAUDE.md used @-import references to pull in every rule file on session start. way2save's was 313 lines long, mostly import directives and duplicated instructions. The model had no choice — all 12 rules loaded before work began.

After: Route by Task

The new CLAUDE.md is 66 lines. All @-import references removed. In their place: a routing table that maps task types to the 2–3 rule files that actually matter for that task. The rules still live in .claude/rules/ — nothing was deleted. They're just loaded on demand instead of by default.

Task Type	Before (12 rules)	After (2–3 rules)
Test fix	navigation docker security codemagic ff-manager …all 12	testing architecture
UI feature	docker security codemagic testing …all 12	navigation architecture design-system
CI/CD deploy	navigation design-system ff-manager …all 12	codemagic testing
Security review	navigation docker design-system …all 12	security architecture auth
Feature flags	docker security codemagic …all 12	ff-manager architecture

The model reads the routing table, identifies the task type, and loads only the rules it needs. Everything else stays on disk, available if the task evolves, but not polluting the context by default.

The Numbers

The reduction was immediate and dramatic.

App	Before	After	Tokens Saved
way2save	1,547 lines	66 lines	~5,924
way2fly	2,178 lines	68 lines	~8,440
way2move	2,250 lines	68 lines	~8,728
Total (cross-app session)	5,975 lines	202 lines	~23,092 97%

That's roughly 23,000 tokens freed up in every cross-app session. Tokens that can now be used for actual reasoning, longer context about the code being modified, or simply running cheaper.

The Deeper Pattern

This isn't just about rule files. It's the same problem that shows up in every AI system that uses context windows: what you don't load matters as much as what you do.

The precision principle applies at every layer of the harness:

Layer 1: Rule Precision

Task-based routing. Load 2–3 rules instead of 12. That's what this post documents.

Layer 2: Schema Precision

The harness schema_reference table. One query returns all table schemas and project IDs for the databases you need. Before this existed, every session started with 4–6 discovery queries: list projects, list databases, describe tables, check which branch is active. Now it's one lookup. The knowledge is pre-indexed.

Layer 3: Work Decomposition

The orchestrator pattern. Instead of one mega-session that touches three apps (loading context for all three simultaneously), spawn scoped agents — one per app, each with only the rules and schema for that app. The orchestrator coordinates; the agents stay focused.

Three layers of precision: what rules to load, what schemas to know, how to decompose work. Each layer compounds. Get all three right and a session runs faster, cheaper, and with higher quality output — because the model's attention is on the task, not on irrelevant instructions.

The Enforcement Gap

This improvement almost didn't get documented.

The rule that says "publish a blog post when you improve the harness" existed — but only as a memory. A note in the auto-memory file. Memories are suggestions. They depend on the model noticing them, prioritizing them, and acting on them. That's behavioral enforcement. It's unreliable.

So we fixed it. Added a mandatory publish rule to harness-mandatory.md — the same global rule file that forces harness connection on every session. The enforcement chain is now structural:

Improve the harness (rule change, schema update, process fix)
Log the learning to the harness database
Write the blog post, in the same session

If you improved the harness but didn't write about it, the session is incomplete. Not "it would be nice to document this." Incomplete. The rule says so.

An undocumented improvement is an improvement that happened once. A documented one compounds.

The system now enforces: improve, log, publish. Not as a suggestion. Not as a memory. As a mandatory rule that fires on every session. The enforcement gap is closed.

The Takeaway

The harness was getting smarter — accumulating more rules, more patterns, more knowledge. But it was also getting heavier. Every improvement added weight to the context window. The system was compounding knowledge and compounding costs at the same rate.

The fix wasn't to stop improving. It was to add precision. Route knowledge to where it's needed. Keep it out of where it isn't. Make the system lighter as it gets smarter, not heavier.

If your AI knowledge system gets more expensive every time it learns something, you've built a liability, not an asset.

Next in the series: whatever breaks next.