From Files to a Knowledge API | Continuous Improvement on harness.os

A knowledge system should make sessions cheaper, not more expensive.

The File Trap

When you start with AI coding tools — Claude Code, Cursor, Copilot — you write a CLAUDE.md or a rules file. You describe how the project works, how tests should be written, what conventions to follow. It works great. The agent reads it, follows the rules, produces good work.

Then you add a second project. You copy the rules over. The second project is similar enough — same framework, same testing conventions — so why not reuse what worked? Then comes the third project. Another copy.

Now you have three copies of "how to write Flutter tests" that slowly drift apart. Someone updates the test patterns in project A but forgets projects B and C. A new convention gets added to project C but never makes it back to A. The rules that started identical are now three slightly different versions of the truth, and none of them is the canonical one.

Meanwhile, every session loads ALL rules regardless of what you're doing. A one-line bug fix loads 1,200 lines of architecture patterns, navigation rules, CI/CD configuration, security conventions, and design system guidelines. The agent reads them all, pays for them all, and uses maybe 5% of them.

This is where most people are today. It works until it doesn't.

The Breaking Point

Here are real numbers from our setup. Three Flutter apps — way2fly, way2move, way2save — each with 12–14 rule files covering architecture, testing, navigation, security, CI/CD, feature flags, and more.

Metric	Value
Total rule lines loaded (cross-app session)	5,975 lines
Estimated tokens wasted per session	~24,000 tokens
Context compaction frequency	Every 15–20 minutes
Schema re-discovery after each compaction	~800 tokens per cycle
Net effect	The harness was making work slower, not faster

Context compaction every 15–20 minutes means the model is summarizing its own memory to free up space — and losing detail each time. Schema re-discovery after every compaction means it has to re-learn the database structure it already knew, burning another 800 tokens each cycle.

The tools themselves — the harness, the rules, the knowledge system — were making the work slower, not faster. The very thing built to help was becoming the bottleneck.

The moment we realized: the knowledge system should make sessions cheaper, not more expensive. If every improvement to the rules adds weight to the context window, you've built a system that degrades as it learns. That's the opposite of intelligence.

The Progression

Knowledge management for AI agents evolves through four stages. Most teams are at Stage 1. The fix isn't to jump straight to Stage 4 — it's to know which stage you're at and what the next one looks like.

Firehose

Load everything, always. Every rule file gets injected into every session regardless of the task. Works fine for one project with a handful of rules. Breaks the moment you scale.

Where most people are

Manual Routing

A task-based routing table in CLAUDE.md tells the agent which 2–3 rules to read based on the task type. Our first fix — 97% token reduction. Still file-based, still per-project, but dramatically more efficient.

Quick win — 30 min to implement

Knowledge API

Rules stored in a database as tagged chunks. An MCP tool serves only relevant chunks based on task concerns and project scope. One source of truth across all projects. No more drifting copies.

The target architecture

Learned Routing

Track which chunks actually get used per task type. Prune unused ones. Merge frequently co-served ones. The system observes its own usage patterns and optimizes itself.

Self-optimizing — future state

Each stage builds on the previous one. You don't need to rip out files to build the API — Stage 2 buys you time while you build Stage 3. The important thing is to stop treating "load everything" as the permanent solution.

The Architecture

Stage 3 — the Knowledge API — has a concrete design. Here's how it works.

The Flow

Agent receives task
  ↓
Classifies concerns: ["testing", "domain-logic"]
  ↓
Calls MCP tool:
  get_knowledge(
    concerns = ["testing", "domain-logic"],
    project  = "way2save"
  )
  ↓
MCP server queries knowledge_chunks table
  WHERE concerns overlap ["testing", "domain-logic"]
  AND (project_scope includes "way2save" OR project_scope IS NULL)
  ↓
Returns ~300 lines instead of 1,200
  ↓
Agent works with precise, relevant context

The Schema

knowledge_chunks
  slug             text PRIMARY KEY    -- unique identifier
  title            text               -- human-readable name
  content          text               -- the actual rules/knowledge
  concerns         text[]             -- tags: testing, flutter-ui, firebase-data...
  project_scope    text[]             -- which projects (NULL = all projects)
  token_estimate   integer            -- cost tracking per chunk
  updated_at       timestamptz        -- when this knowledge was last revised

The key design decision: project_scope NULL means "shared across all projects." This is how you have ONE copy of "how to write Flutter tests" that serves all three apps, plus project-specific chunks like "way2save multi-currency rules" that only load for way2save sessions. One source of truth. No drift.

The agent doesn't need to know which files exist in which project's .claude/rules/ directory. It asks for knowledge by concern and scope, and the API returns exactly what's relevant. The routing logic lives in the query, not in the agent's head.

What Files Can't Do

This isn't about files being bad. Files are the right starting point. But they have structural limitations that become blockers at scale.

Capability	Files	Knowledge API
Source of truth	N copies that drift apart	One canonical source
Retrieval	Load everything or nothing	Query by concern + scope
Usage tracking	No visibility into what helps	Track which rules get used per task type
Updates	Git commit + push + pull in every project	Update once, served everywhere instantly
Multi-interface	Only works where files exist on disk	Any agent, any interface (CLI, web, custom)
Scoping	Same blob for every task	Per-project, per-concern filtering
Cost awareness	Unmeasured token impact	Token estimate per chunk

The comparison isn't theoretical. Every row in that table is a problem we hit. Drifting copies caused bugs where one app followed updated patterns and another followed stale ones. Loading everything caused compaction cycles that lost important context. No usage tracking meant we couldn't tell which rules were worth their token cost and which were dead weight.

When to Make the Switch

Don't over-engineer this. The right stage depends on where you are.

One project: stay with files

If you have a single project with a single CLAUDE.md and a handful of rules, files are fine. The overhead of a database-backed system isn't justified. Write good rules, keep them maintained, and move on. You have real work to do.

Two to three projects with shared patterns: Stage 2

This is the breaking point. If you're copying rules between projects, or noticing that the same convention is described differently in different places, build a routing table. It's 30 minutes of work: remove the @-import directives from your CLAUDE.md, replace them with a task-to-rules mapping table, and let the agent load on demand instead of by default. We documented how to do this in the previous post.

Three or more projects, multiple interfaces: Stage 3

If you have three or more projects, agents running from multiple interfaces (CLI, web platform, custom tools), or cross-project workflows that need shared knowledge — build the Knowledge API. The investment pays for itself in the first week through reduced token waste and eliminated drift.

The signal that you've outgrown files: you're copying rules between projects, your sessions keep hitting compaction, or you're fixing the same rule in multiple places. Any one of these means it's time to move up a stage.

The Bigger Picture

This isn't just about saving tokens. It's about a fundamental shift in how you think about AI knowledge.

Configuration lives in files next to code. Infrastructure lives in services that multiple consumers share. Your AI knowledge is infrastructure — treat it like it.

When rules live in files, they're configuration. They're local to one project, versioned with its code, loaded by its tools. That's fine for project-specific settings. But patterns, conventions, architectural decisions, testing strategies — these aren't project-specific. They're organizational knowledge. They apply across projects, across tools, across interfaces.

Organizational knowledge doesn't belong in config files scattered across repositories. It belongs in a knowledge service that any consumer can query. The same way you wouldn't copy your production database into every project's repo, you shouldn't copy your development knowledge into every project's rules directory.

This is what we call AI Knowledge Engineering — the discipline of organizing what AI agents know so they can do better work with less waste. It's not about writing more rules. It's about building the infrastructure that serves the right rules to the right agent at the right time.

The best knowledge system is one where the agent never loads something it doesn't need, and never misses something it does. Files can't get you there. A queryable, scoped, tracked knowledge API can.

Start where you are. If you're at Stage 1, move to Stage 2 — it's a 30-minute fix with a 97% token reduction. If you're already feeling the pain of multiple projects with drifting rules, build Stage 3. The progression is clear, the architecture is proven, and each step makes the next one easier.

Your AI agents are only as good as the knowledge they work with. Make that knowledge precise, shared, and queryable — and everything downstream gets better.