Intelligence isn't knowing everything. It's knowing what matters right now.
The Firehose Problem
Every Claude Code session on way2save loaded 12 rule files. 1,234 lines of instructions, injected into the context window before a single line of work began. Navigation rules. Docker setup. Security patterns. Codemagic CI/CD configuration. Feature flag conventions. All of it, every time, regardless of what the task actually needed.
A simple test fix — change one assertion in one file — loaded every rule the project had ever accumulated. Navigation architecture? Irrelevant. Docker multi-stage build patterns? Irrelevant. OAuth security flows? Irrelevant. But they were all there, consuming tokens, diluting the context that actually mattered.
Scale that across a cross-app session touching all three Flutter apps — way2fly, way2move, way2save — and the numbers get ugly: 5,975 lines of rules loaded. Roughly 24,000 tokens. Most of them irrelevant to the task at hand.
This is like reading an entire encyclopedia to answer one question. The answer is in there somewhere, buried under thousands of pages you didn't need. The cost isn't just the reading — it's the dilution. The signal drowns in noise.
The Principle
The realization came from a simple question: what is the harness actually supposed to do? The answer reframed everything.
This flips the mental model. The harness isn't a knowledge store — it's a knowledge router. Its job isn't to add context. Its job is to filter it. Serve precisely what's needed for this task, this app, this moment. Nothing more.
Every line of context you load that isn't relevant to the current task has a cost. It uses tokens. It competes for attention in the model's context window. It increases the chance that the model fixates on an irrelevant instruction instead of the relevant one. Context pollution is real, and we were doing it to ourselves.
The job isn't to ADD context — it's to FILTER it. A knowledge router that serves three precise rules beats a knowledge dump that serves twelve generic ones. Every time.
The Fix: Task-Based Routing
The fix was structural, not incremental. We didn't trim the rule files or make them shorter. We changed when they load.
Before: Force-Load Everything
Each app's CLAUDE.md used @-import references to pull in every rule file on session start. way2save's was 313 lines long, mostly import directives and duplicated instructions. The model had no choice — all 12 rules loaded before work began.
After: Route by Task
The new CLAUDE.md is 66 lines. All @-import references removed. In their place: a routing table that maps task types to the 2–3 rule files that actually matter for that task. The rules still live in .claude/rules/ — nothing was deleted. They're just loaded on demand instead of by default.
| Task Type | Before (12 rules) | After (2–3 rules) |
|---|---|---|
| Test fix | navigation docker security codemagic ff-manager …all 12 | testing architecture |
| UI feature | docker security codemagic testing …all 12 | navigation architecture design-system |
| CI/CD deploy | navigation design-system ff-manager …all 12 | codemagic testing |
| Security review | navigation docker design-system …all 12 | security architecture auth |
| Feature flags | docker security codemagic …all 12 | ff-manager architecture |
The model reads the routing table, identifies the task type, and loads only the rules it needs. Everything else stays on disk, available if the task evolves, but not polluting the context by default.
The Numbers
The reduction was immediate and dramatic.
| App | Before | After | Tokens Saved |
|---|---|---|---|
| way2save | 1,547 lines | 66 lines | ~5,924 |
| way2fly | 2,178 lines | 68 lines | ~8,440 |
| way2move | 2,250 lines | 68 lines | ~8,728 |
| Total (cross-app session) | 5,975 lines | 202 lines | ~23,092 97% |
That's roughly 23,000 tokens freed up in every cross-app session. Tokens that can now be used for actual reasoning, longer context about the code being modified, or simply running cheaper.
The Deeper Pattern
This isn't just about rule files. It's the same problem that shows up in every AI system that uses context windows: what you don't load matters as much as what you do.
The precision principle applies at every layer of the harness:
Layer 1: Rule Precision
Task-based routing. Load 2–3 rules instead of 12. That's what this post documents.
Layer 2: Schema Precision
The harness schema_reference table. One query returns all table schemas and project IDs for the databases you need. Before this existed, every session started with 4–6 discovery queries: list projects, list databases, describe tables, check which branch is active. Now it's one lookup. The knowledge is pre-indexed.
Layer 3: Work Decomposition
The orchestrator pattern. Instead of one mega-session that touches three apps (loading context for all three simultaneously), spawn scoped agents — one per app, each with only the rules and schema for that app. The orchestrator coordinates; the agents stay focused.
Three layers of precision: what rules to load, what schemas to know, how to decompose work. Each layer compounds. Get all three right and a session runs faster, cheaper, and with higher quality output — because the model's attention is on the task, not on irrelevant instructions.
The Enforcement Gap
This improvement almost didn't get documented.
The rule that says "publish a blog post when you improve the harness" existed — but only as a memory. A note in the auto-memory file. Memories are suggestions. They depend on the model noticing them, prioritizing them, and acting on them. That's behavioral enforcement. It's unreliable.
So we fixed it. Added a mandatory publish rule to harness-mandatory.md — the same global rule file that forces harness connection on every session. The enforcement chain is now structural:
- Improve the harness (rule change, schema update, process fix)
- Log the learning to the harness database
- Write the blog post, in the same session
If you improved the harness but didn't write about it, the session is incomplete. Not "it would be nice to document this." Incomplete. The rule says so.
The system now enforces: improve, log, publish. Not as a suggestion. Not as a memory. As a mandatory rule that fires on every session. The enforcement gap is closed.
The Takeaway
The harness was getting smarter — accumulating more rules, more patterns, more knowledge. But it was also getting heavier. Every improvement added weight to the context window. The system was compounding knowledge and compounding costs at the same rate.
The fix wasn't to stop improving. It was to add precision. Route knowledge to where it's needed. Keep it out of where it isn't. Make the system lighter as it gets smarter, not heavier.
Next in the series: whatever breaks next.