The Session That Stopped Itself | Continuous Improvement on harness.os

When your AI session admits it's not using the system you built — and you make it impossible to happen again.

What I Was Doing

Running QA across three Flutter apps — way2fly, way2move, way2save — checking feature flags, running test suites, verifying Codemagic CI/CD builds would pass for TestFlight. Standard cross-app maintenance. The kind of session where you want everything tracked so the next session picks up cleanly.

I had build.ai running at localhost:5180. The harness databases were live. The MCP tools were available. Everything was ready for the session to be visible on the platform.

Except it wasn't.

What Broke

Halfway through the session, after completing QA on all three apps and fixing a Codemagic blocker in way2move, I asked Claude Code a direct question: "Are you using the harness like build.ai would?"

The answer was honest: "Not fully. I don't have the harness-os MCP tools connected in this session, so I'm working directly with the codebase but not logging decisions or learnings to the harness database."

In other words: the work was happening, but it was invisible. build.ai couldn't show it. The next session couldn't pick it up. The shared brain wasn't being used.

I said: stop.

A session that works around the harness is worse than useless — it creates the illusion of productivity while fragmenting knowledge.

Why This Matters

The whole point of harness.os is that it's the shared source of truth. build.ai reads from it. Claude Code writes to it. A session started in one can be continued in the other. But that only works if both sides actually use it.

A session that works around the harness is worse than useless — it creates the illusion of productivity while fragmenting knowledge. The QA results? Gone after the context window compacts. The Codemagic fix? Exists in git but with no decision trail explaining why. The feature flag analysis? Lost.

This is the fundamental failure mode of AI-assisted development: the work happens, but the knowledge doesn't persist.

What I Improved

1. Global Enforcement Rule

Created ~/.claude/rules/harness-mandatory.md — a rule file loaded into every Claude Code session under my projects directory. It mandates:

Before any work: connect to the harness databases via Neon MCP. Register the session.
During work: log decisions, learnings, and events to the harness in real-time.
On blockers: stop. Fix the harness. Document the improvement. Never work around it.
On session end: update status, write handoff.

If the connection can't be established, the rule says: stop working and tell the user. No exceptions.

2. Fallback Path Documented

The dedicated harness-os MCP server wasn't configured for this project. But the Neon MCP tools were available globally, and they connect to the same database. So I documented the fallback: if the dedicated MCP isn't there, use Neon MCP directly against purple-cell-95681470 (harness-os-core).

This isn't ideal — the dedicated tools have ergonomic functions like start_session() and log_learning(). But it works. And documenting it means the next session doesn't waste 10 minutes rediscovering the workaround.

3. Core Principles Logged

Five learnings went into the harness database, all tagged with core-principle:

Harness usage is mandatory — not optional, not nice-to-have
Token economy — track usage, optimize context, get cheaper every session
Blocker improvement — every wall is an opportunity, never work around it
Auto-publish learnings — improvements become blog posts in the same session
Neon MCP fallback — when the dedicated server isn't configured, use the raw database

These persist. The next session — whether from build.ai or Claude Code — will see them.

Five core principles, logged as structured data. Not in a markdown file that might get forgotten. In the database that every session reads on startup.

What's Different Now

Before this session, a Claude Code session could silently ignore the harness. Now it can't. The global rule fires on every session start. The enforcement is structural, not behavioral.

But I also identified the next gap: there's no token tracking yet. I know the harness is getting bigger — more rules, more knowledge, more context loaded per session. But I don't know if it's getting more efficient. The next improvement is a feedback loop that tracks which rules were actually used in each session, so the system can learn to load less and run cheaper.

The harness should compound knowledge, not compound costs.

Structural enforcement beats behavioral enforcement. You can't forget to follow a rule that runs before your session starts. The system prevents the failure mode instead of hoping you remember to avoid it.

The Meta Point

This post was written in the same session where the improvement happened. Not planned. Not retrospective. The harness logged the learning, I wrote the post, it gets published. That's the workflow: use → break → fix → document → publish → use better.

If your AI system doesn't improve itself, you're just paying for a faster typewriter.

Next in the series: whatever breaks next.