Token Economics at Scale | Continuous Improvement on harness.os

The question nobody asks early enough

Every team that starts using AI for development eventually hits the same wall. The first project works great — you feed the AI your rules, it writes quality code. The second project doubles the context. The third triples it. By the time you have five projects sharing an AI session, you’re spending more tokens on explaining what to do than on doing it.

I hit this wall at three projects. A single Claude Code session orchestrating work across way2fly, way2move, and way2save was compacting its context 3–5 times per session. Each compaction lost detail. Each recovery re-discovered harness schemas. The AI was spending its intelligence budget on re-orientation instead of creation.

Token cost doesn’t just increase with projects — it compounds. More context means more compactions, which means more recovery, which means more context.

This is the token economics problem. And it matters even if you only work on one project once a week, because the overhead of that single session still scales with how much knowledge the AI needs to load before it can be useful.

The math: context cost without optimization

Let’s make this concrete with real numbers from before we optimized.

Metric	1 project	3 projects	10 projects
Rule files loaded	12–14	36–42	120–140
Context tokens (rules only)	~6K	~18K	~60K
Schema bootstrap queries	3–5	9–15	30–50
Compactions per session	0–1	3–5	10–15
% tokens spent on orientation	~15%	~40%	~65%

At 3 projects, nearly half the token budget went to orientation — reading rules, discovering schemas, reconstructing context after compaction. At 10 projects, two-thirds of your spending produces zero useful output. The AI is just loading up, over and over.

Context cost scales linearly with projects, but productive output doesn’t. Without optimization, you hit a ceiling where adding projects costs more than the work they produce.

The 6-stage harness progression

The fix isn’t one optimization — it’s a progression. Each stage reduces the ratio of orientation tokens to productive tokens. Here’s the path from “dump everything” to “self-optimizing knowledge.”

No harness — raw prompting

Every session starts from zero. You re-explain conventions, paste rules into chat, re-discover project structure. Knowledge lives in your head.

Most expensive per task

Firehose — CLAUDE.md loads everything

Rules live in files. Every session auto-loads all of them via @.claude/rules/ imports. Quality improves dramatically, but context bloats with every new rule file.

Where most teams stop

Manual routing — task-based rule loading

CLAUDE.md contains a routing table: “for Flutter UI tasks, read these 3 files; for Firebase tasks, read these 2.” The AI loads only what the current task needs. 97% context reduction.

Our current state (files)

Knowledge API — database-backed serving

Rules migrate from files to a database with concern tags and project scope. An MCP tool serves only the matching chunks. Same precision as Stage 2, but works across projects from a single source of truth.

Our current state (DB)

Learned routing — usage-based optimization

The knowledge_usage table tracks which chunks are served per session and per task type. After enough data, the system suggests which chunks to load before you ask — or retires chunks that are never used.

Next target

Predictive knowledge — context anticipation

The harness analyzes the task request, project state, and historical patterns to pre-load exactly the right context before the agent starts working. Zero manual routing. The harness becomes a knowledge compiler.

Long-term vision

The crossover point

Here’s the part that matters for anyone deciding whether to invest in building this infrastructure: there’s a crossover point where the harness starts saving more tokens than it costs.

Without a harness (Stage 0–1)

Token cost scales linearly with projects and session frequency. If you work on 3 projects doing 5 sessions each per week, that’s 15 sessions, each paying the full orientation tax. Add a project, add 5 more sessions of overhead.

With optimization (Stage 2+)

Each session loads a fraction of the knowledge. The orientation cost becomes nearly constant regardless of how many projects exist in the ecosystem — because each session only touches one project’s relevant subset.

Relative context cost per session

1 project

~200

3 projects

18K

~600

10 projects

60K

~800

■ Without optimization (firehose) ■ With Knowledge API (concern-based)

The crossover point for our setup was around 2 projects. At 1 project, the firehose works fine — 6K tokens of rules is cheap. But the moment you add a second project and start cross-referencing, the compounding kicks in. By 3 projects, the optimized path uses 30x fewer context tokens for rules alone.

The harness investment pays for itself almost immediately at the multi-project scale. Two projects is enough for the crossover. The more projects you add, the wider the gap becomes.

The orchestrator multiplier

The economics get even more interesting when you combine knowledge precision with the orchestrator pattern.

Instead of one mega-session that loads everything and works across all projects, a thin orchestrator session delegates work to scoped sub-agents. Each sub-agent gets:

One project’s connection details
Only the concern-relevant knowledge chunks
The specific task to complete
A schema reference for one-query bootstrap

The orchestrator itself carries almost no project knowledge — it’s a router. This means:

Metric	Mega-session	Orchestrator + sub-agents
Context per agent	Full ecosystem	One project subset
Compactions	3–5 per session	0–1 per sub-agent
Knowledge loss from compaction	Cumulative	Isolated per agent
Parallelism	Sequential	Concurrent sub-agents
Single point of failure	Yes	No — agents are independent

A compaction in a mega-session loses all project context. A compaction in a sub-agent loses only that sub-agent’s context — the orchestrator and other agents are unaffected. The blast radius of context loss shrinks from “everything” to “one task.”

What happens at 10+ projects

This is where the long-term vision matters. Without optimization, 10 projects is nearly unworkable — 65% of your token budget is overhead. With the Knowledge API, 10 projects costs barely more than 3, because each session only touches one project’s relevant subset.

But the real leverage comes from shared knowledge chunks. Our 3 apps share 9 chunks (Flutter architecture, testing, navigation, state management, Firebase patterns) that are maintained once and served to any project that needs them. Add a fourth Flutter app and the marginal knowledge cost is near zero — the chunks already exist.

The compounding returns

Shared chunks amortize across projects. Write the Flutter testing rules once, serve them to every Flutter project forever.
Usage data improves routing. After 100 sessions, the system knows which chunks are actually used for each task type. Unused chunks get pruned. The knowledge base gets tighter, not larger.
New projects bootstrap instantly. A new Flutter + Firebase project doesn’t start from scratch — it inherits the shared knowledge and only needs project-specific chunks.
Cross-project patterns emerge. When the same learning appears in multiple projects, it becomes a shared chunk automatically. Knowledge consolidates upward.

At scale, the harness doesn’t just reduce cost — it inverts the curve. Each new project makes the existing knowledge more valuable, while adding minimal marginal cost.

What the industry is doing

For context, here’s how other approaches handle the multi-project knowledge problem:

RAG-based systems

Most enterprise AI coding tools use embedding-based retrieval to find relevant code and documentation. This works for finding information, but it’s imprecise for rules — you don’t want the AI to find a rule sometimes. You want it to follow the rule every time it applies. Semantic search is probabilistic; concern-based tagging is deterministic.

Fine-tuning

Some teams fine-tune models on their codebase. This bakes knowledge into the model’s weights, which eliminates context cost entirely — but makes the knowledge impossible to update without retraining, expensive to maintain, and model-locked. When the next model generation arrives, you start over.

Monorepo + single context

The simplest approach: keep everything in one repo, load one CLAUDE.md. This works until the context window fills up, then it breaks abruptly. No graceful degradation.

The harness approach

Treat knowledge as a queryable service, not a static file or embedded weight. Tag it with concerns, scope it to projects, track usage, and serve only what each task needs. The knowledge stays external to the model (so it works with any model), stays updatable (so it improves continuously), and stays auditable (so you can see what was served and why).

The harness is a deterministic RAG system for development rules. Same concept — retrieve relevant knowledge for a task — but with tags instead of embeddings, guaranteeing consistency over probability.

Practical implications

If you’re building an AI development workflow and thinking about token economics, here are the takeaways:

Start with routing tables (Stage 2)

If you have more than one project, stop loading all rules and add a routing table to your CLAUDE.md. This costs nothing to implement and gives immediate results. The 97% reduction we saw is typical.

Move to a database at 3+ projects (Stage 3)

When projects start sharing rules, file-based routing creates duplication. A database with concern tags and project scope eliminates the duplication and gives you a single source of truth.

Track usage from the start

Even if you don’t use the data yet, logging which knowledge chunks are served per session costs almost nothing and enables Stage 4 (learned routing) later. It’s the cheapest investment with the highest future payoff.

Design for the orchestrator pattern

Even if you’re running single sessions today, structure your knowledge so it can be sliced per-project and per-concern. When you eventually need the orchestrator pattern (and you will, at scale), the knowledge is already ready.

The mistake is waiting until token costs are painful. By then, your rules are tangled, your context is bloated, and untangling is harder than building it right from Stage 2.

Where this is going

The trajectory is clear: knowledge management is becoming the core discipline of AI-assisted development. The teams that treat their rules as queryable, measurable infrastructure will outpace those who keep dumping text into prompts.

The next post in this series will cover Stage 4 in practice — how usage data actually improves routing, and what “learned” knowledge selection looks like when you have enough signal.

For now, the bottom line is this:

The harness isn’t overhead. It’s the thing that makes everything else affordable.