MCP Server Reference

The harness.os MCP server is the Tier 2 implementation of the methodology. A Python MCP server that exposes the CNS schema as tools, enabling any MCP-compatible client to act as an inner harness connecting to the outer harness.

Core Principle: Connecting IS Participating

The outer harness enforces behavior on any client that connects. The client does not choose to participate in session tracking, event logging, or guardrails -- connecting IS participating.

This means Claude Code connecting to harness-os MCP gets the same tracking as build.ai's agent pipeline connecting to the same MCP. The outer harness is the same, so the process enforcement is the same. This is the proof that the inner harness is interchangeable.

If tracking lived in client-side hooks (Claude Code settings, Copilot extensions, custom agent config), swapping to a different inner harness would lose tracking. With tracking in the outer harness (MCP server), any client that connects gets it automatically.

Scale 1: Achieving This With Files

Not everyone has MCP. If you are using Copilot, Cursor, or any AI tool that reads project files, you can implement the same harness.os principles with markdown files and conventions. No server, no database.

File structure

your-project/
  CLAUDE.md                    # or .copilot-instructions.md, or AGENTS.md
  .claude/rules/               # or .github/copilot-instructions/
    coding-standards.md
    testing.md
    architecture.md
  docs/
    decisions/
      001-state-management.md
      002-database-choice.md
    domain/
      entities.md
      glossary.md
    specs/
      feature-a.md
  CHANGELOG.md                 # manual session log

How the four harness types map to files

Harness Type	File Location	Content
Build	`CLAUDE.md` + `.claude/rules/`	Coding standards, workflow rules, architecture constraints
Product	`docs/specs/` + `docs/decisions/`	Feature specs, ADRs, roadmap
Operations	`docs/domain/`	Domain knowledge, process descriptions, terminology
Domain	`domain/` or app database	Structured domain data — can be YAML/JSON files or a database

Domain data as files

The domain harness does not require a database. For small teams or solo developers, structured files work well for tracking domain data that would otherwise live in a database.

domain/
  time-tracking/
    marco/
      way2fly.yaml            # hours per project per person
      way2move.yaml
    pedro/
      lakedeck.yaml
  project-health/
    way2fly.yaml              # status, blockers, last deploy
    way2move.yaml
  team/
    contributors.yaml         # who works on what, availability

Each file follows a defined schema:

YAML# domain/time-tracking/marco/way2fly.yaml
project: way2fly
contributor: marco
entries:
  - date: 2026-05-05
    hours: 3.5
    activity: feature/jump-logbook
  - date: 2026-05-06
    hours: 2.0
    activity: bugfix/voice-recording

The rules are the same as database-backed domain harnesses: define the schema (what fields, what types), use predictable paths (one file per entity), and let the agent read and write them. Git provides version history and merge conflict handling for multi-contributor scenarios.

File-based domain data works when last-write-wins is acceptable (solo dev, small team), you don't need real-time queries across entities, and the data volume fits comfortably in a directory tree. Move to a database when you need concurrent writes, aggregation queries, or the data grows past what files handle well.

How to enforce behavior with files

Files alone cannot force behavior the way an MCP server can. But some AI tools provide hooks that get you closer:

What works without MCP

Instruction files: CLAUDE.md, .cursorrules, .github/copilot-instructions.md — rules the agent reads at session start. Write in imperative voice: "Always use X. Never use Y."
Decision records: docs/decisions/001-state-management.md — "Decision: Riverpod. Rationale: testability. Alternatives rejected: Bloc." Prevents the AI from re-suggesting rejected alternatives.
Claude Code hooks (.claude/hooks.json): Shell commands that run before/after specific tool calls. Can validate outputs, log actions, even reject operations. This is the closest you get to guardrails without MCP.
Claude Code skills (.claude/skills/): Structured instruction files the agent loads on demand. Like lightweight tools — but they're instructions, not enforced interfaces.
Session log: Manual CHANGELOG.md or SESSION_LOG.md. The manual version of start_session / end_session.

What does NOT work without MCP

Universal interceptor: Hooks are Claude Code-specific. Cursor, Copilot, Windsurf, and other agents don't have them. MCP works with any agent that supports the protocol.
Mediated access: Without MCP, the agent reads and writes files directly. It can skip validation, ignore rules, or write malformed data. MCP puts a server between the agent and the data.
Automatic session tracking: No hook can reliably track every action across an entire session. MCP's log_tool_call interceptor wraps every tool call automatically.
Cross-agent consistency: Each AI tool has its own hook/instruction system. MCP is the standard — one server works for Claude Code, Cursor, custom agents, or any MCP client.

Always use MCP when you can. The only choice is what storage backend sits behind it — files or database. MCP is the universal adapter layer. Hooks and instruction files are useful supplements, not replacements.

Limitations at Scale 1 (files without MCP)

No automatic tracking. You must manually log sessions and decisions.
No compound learning. Insights do not accumulate automatically across sessions.
No cross-project queries. You cannot ask "what did I learn about testing across all projects?"
No enforcement. The AI reads the files but nothing prevents it from ignoring them. There is no interceptor.
No mesh. Each project is isolated. Cross-domain reasoning requires manual context.
Agent-specific. Claude Code hooks don't work in Cursor. Cursor rules don't work in Copilot. Each tool has its own conventions.

Scale 1 + MCP: File-Backed Tools

Scale 1 has a key weakness: the agent reads files directly. It can ignore rules, forget to log decisions, or write data in the wrong format. There's no interceptor, no guardrails.

The fix: put an MCP server in front of the same files. The agent gets the same tools as Scale 2 (start_session, log_decision, get_rules, search_knowledge) but the backend reads and writes structured files instead of PostgreSQL.

Scale 1 (no MCP):
Agent  --reads-->  CLAUDE.md, .claude/rules/, docs/decisions/
       (no mediation, no tracking, agent can ignore rules)

Scale 1 + MCP (file-backed tools):
Agent  --calls-->  MCP Server  --reads/writes-->  harness/ folder
       (same tools as Scale 2, guardrails enforce behavior)
       (agent never touches files directly)

Scale 2 (database-backed tools):
Agent  --calls-->  MCP Server  --queries-->  PostgreSQL
       (same tools, same guardrails, full search + mesh)

How it works

The MCP server starts with a HARNESS_PATH instead of a DATABASE_URL. It expects a harness directory with a known structure:

harness/
  rules/                       # spine_rules equivalent
    coding-standards.md        # YAML frontmatter: slug, triggers, priority
    testing.md
    architecture.md
  workflows/                   # spine_workflows equivalent
    feature-development.yaml   # steps[], triggers[]
    bug-fix.yaml
  knowledge/                   # cortex_chunks equivalent
    dev-workflow/
      git-conventions.md
      ci-pipeline.md
    design/
      color-system.md
      typography.md
  decisions/                   # decisions table equivalent
    001-state-management.yaml
    002-database-choice.yaml
  learnings/                   # learnings table equivalent
    testing-patterns.yaml
    performance-traps.yaml
  sessions/                    # session_handoffs equivalent
    latest-handoff.yaml        # written by end_session
    log/                       # session history
      2026-05-07T14-30.yaml
  domain/                      # domain data — structured files
    time-tracking/
      marco/way2fly.yaml
    project-health/
      way2fly.yaml

Same tools, same workflow

The agent calls the exact same tools. The implementation is different — file I/O instead of SQL — but the interface is identical:

Tool call	File-backed implementation
`start_session(project)`	Reads `sessions/latest-handoff.yaml`, scans `rules/` for matching triggers, returns rules + handoff
`end_session(summary, ...)`	Writes `sessions/latest-handoff.yaml`, appends to `sessions/log/`
`get_rules(context)`	Scans `rules/*.md` frontmatter for matching `triggers[]`, returns content
`log_decision(title, rationale)`	Appends numbered YAML to `decisions/`
`log_learning(title, content)`	Appends YAML to `learnings/`
`search_knowledge(query)`	Keyword search across `knowledge/*/.md` content
`get_knowledge_by_domain(domain)`	Reads all files in `knowledge/{domain}/`

Rule files with frontmatter

Each rule file has YAML frontmatter that the MCP server reads to match triggers:

harness/rules/testing.md---
slug: testing-standards
name: Testing Standards
triggers: [testing, tdd, unit-test, integration-test]
priority: 10
---

Always write the test first, then the implementation.
Every feature must have tests before it ships.
Never hit real databases in unit tests — use fakes or the emulator.

When the agent calls get_rules("testing"), the MCP server scans all files in rules/, matches the frontmatter triggers array, and returns the full content of matching rules — just like the database-backed version queries spine_rules WHERE triggers && ARRAY['testing'].

Guardrails still work

The critical difference between Scale 1 and Scale 1 + MCP: the agent never touches files directly. Every read and write goes through MCP tools, which means:

Session tracking: The interceptor logs every tool call to sessions/log/
Validation: The server can reject malformed writes (missing required fields, wrong data types)
Guardrails: Post-tool hooks fire the same way — log_decision auto-creates a decision event
Instructions: The MCP instructions field tells the agent the session lifecycle — start, work, end

When you outgrow files, swap one environment variable: change HARNESS_PATH=/path/to/harness to DATABASE_URL=postgresql://.... The agent's config doesn't change. The tools don't change. The CLAUDE.md doesn't change. The storage backend is the only thing that moves.

Setup

Option A: Harness inside the project

The harness folder lives in the repo. It gets version-controlled with the project. Good when the harness is project-specific.

Bash# 1. Create the harness directory in your project
mkdir -p harness/{rules,workflows,knowledge,decisions,learnings,sessions/log,domain}

# 2. Add your first rule
cat > harness/rules/coding-standards.md << 'EOF'
---
slug: coding-standards
name: Coding Standards
triggers: [coding, implementation, refactoring]
priority: 10
---

Use TypeScript strict mode. No `any` types.
Prefer composition over inheritance.
EOF

# 3. Configure MCP (per-project .mcp.json)
cat > .mcp.json << 'EOF'
{
  "mcpServers": {
    "harness": {
      "command": "python",
      "args": ["/path/to/harness-os-mcp/server.py"],
      "env": {
        "HARNESS_PATH": "./harness",
        "PROJECT_SLUG": "my-project"
      }
    }
  }
}
EOF

Option B: Harness on the machine (shared across projects)

The harness folder lives outside any project — on the machine itself. Every project on this machine connects to the same harness. Good for shared build rules, personal knowledge, or a team's coding standards.

Bash# 1. Create a machine-wide harness
mkdir -p ~/.harness/build/{rules,workflows,knowledge,decisions,learnings,sessions/log}

# 2. Add shared rules (apply to all projects on this machine)
cat > ~/.harness/build/rules/testing.md << 'EOF'
---
slug: testing-standards
name: Testing Standards
triggers: [testing, tdd, unit-test, integration-test]
priority: 10
---

Always write the test first. TDD is not optional.
Never mock what you don't own.
EOF

~/.claude/settings.json (machine-wide config){
  "mcpServers": {
    "build-harness": {
      "command": "python",
      "args": ["/opt/harness-os-mcp/server.py"],
      "env": {
        "HARNESS_PATH": "~/.harness/build",
        "HARNESS_ID": "build-harness"
      }
    }
  }
}

Now every Claude Code session on this machine gets the build harness automatically. No per-project config. No .mcp.json in each repo. The agent calls start_session and gets your shared rules. Open a new project, the harness is already there.

Combining both

You can run both. Machine-wide harness for shared rules (build standards, testing patterns) plus a per-project harness for project-specific knowledge (domain data, product decisions):

Machine-wide (~/.claude/settings.json)// Shared build harness — every project gets this
"build-harness": { "env": { "HARNESS_PATH": "~/.harness/build" } }

Per-project (.mcp.json)// Project-specific product harness — only this project
"product-harness": { "env": { "HARNESS_PATH": "./harness", "PROJECT_SLUG": "way2fly" } }

The agent sees both harness instances. It gets build rules from the machine-wide harness and product context from the project harness. This is the file-based equivalent of the mesh — two harness instances, each with their own knowledge, connected through the same agent session.

Move to Scale 2 (database) when: You have 3+ projects and need cross-project queries, you want semantic search over knowledge, or you need the mesh to coordinate multiple harness instances. The upgrade is one environment variable: swap HARNESS_PATH for DATABASE_URL.

Scale 2: Database-Backed MCP Server

At Scale 2, the storage backend is PostgreSQL. Same MCP tools, same guardrails, but with full SQL queries, semantic search via pgvector, and cross-project mesh connectivity.

Scale Comparison

Concern	Scale 1 (files, no MCP)	Scale 1 + MCP (files)	Scale 2 (database)
Agent interface	Reads files directly	MCP tools	MCP tools (same)
Session tracking	Manual changelog	Automatic (file-based log)	Automatic (database)
Decision logging	Write to `docs/decisions/`	`log_decision` → YAML file	`log_decision` → SQL row
Rules enforcement	AI reads file, may ignore	Interceptor mediates access	Interceptor mediates access
Knowledge search	Manual file browsing	Keyword search over files	Semantic search (pgvector)
Cross-project	Copy-paste between repos	Shared harness folder	SQL queries across projects
Learning accumulation	Manual notes	`log_learning` → YAML	`log_learning` + transferability
Mesh connectivity	None	None	Full mesh events + transactions
Setup required	None	MCP server + directory	MCP server + PostgreSQL

Future: Scale 3+

Scale 3 (Remote MCP) and Scale 4 (Federated) are designed but not built yet. They add:

Remote MCP via Streamable HTTP -- same server, accessible over network
Authentication and authorization -- JWT + RBAC per harness instance
Multi-tenant isolation -- per-tenant Neon branches
Federated learning sync -- high-transferability learnings published across meshes

Architecture

MCP Client (inner harness)          MCP Server (outer harness)           Neon PostgreSQL
  Claude Code / Copilot / API   -->   server.py (Python, asyncio)   -->   branch per harness
                                        |
                                        +-- tools/state.py          (projects, sessions, roadmap)
                                        +-- tools/spine.py          (rules, workflows, prompts)
                                        +-- tools/cortex.py         (knowledge, search, embeddings)
                                        +-- tools/learnings.py      (accumulated insights)
                                        +-- tools/agents.py         (agent registry)
                                        +-- tools/health.py         (harness health checks)
                                        +-- tools/events.py         (mesh event stream)
                                        +-- tools/transactions.py   (cross-harness operations)
                                        +-- tools/mesh.py           (mesh topology, instances)
                                        +-- tools/concerns.py       (cross-cutting concern queries)
                                        +-- tools/tracking.py       (session/event queries)
                                        +-- tools/logging.py        (auto-tracking interceptor)
                                        +-- tools/guardrails.py     (post-tool event hooks)

The server is a thin Python layer over standard PostgreSQL. If MCP evolves, the adapter changes -- the schema, data, and knowledge do not.

Enforcement Mechanisms

Four mechanisms make "connecting IS participating" work:

1. Server-side session IDs

The server generates a UUID per MCP connection. The client never provides a session ID -- the server creates it. Every tool call within that connection shares the same session ID.

Python# server.py -- generated once per connection (one process = one MCP connection in stdio)
_connection_session_id: str = str(uuid.uuid4())

This means tracking works identically for Claude Code, Copilot, or any MCP client. The client does not need to know about sessions.

2. Tool call interceptor (`tools/logging.py`)

Every tool handler is wrapped with log_tool_call(). This is the single choke point -- all tool calls flow through it. No tool can bypass it.

The interceptor does three things on every call:

Auto-creates a session on the first tool call (upserts claude_sessions row)
Records the tool event (inserts into claude_session_events with params, duration, status)
Increments the tool call counter on the session

Python# server.py -- every handler gets wrapped
HANDLERS = {
    **{t.name: log_tool_call(state.handle, get_session_id=get_session_id) for t in state.TOOLS},
    **{t.name: log_tool_call(spine.handle, get_session_id=get_session_id) for t in spine.TOOLS},
    # ... every module gets the same wrapping
}

3. Connection lifecycle (session end)

When the MCP connection closes (stdin EOF), server.py calls end_tracking_session() in a finally block. The session is marked as completed with ended_at timestamp. The client does not need to call anything.

Pythonasync def main():
    await get_pool()
    try:
        async with stdio_server() as (read_stream, write_stream):
            await server.run(read_stream, write_stream, server.create_initialization_options())
    finally:
        await end_tracking_session(_connection_session_id)
        await close_pool()

4. Post-tool guardrails (`tools/guardrails.py`)

Specific tool calls trigger additional automatic events. For example, calling log_decision automatically emits a decision event. Calling end_session automatically emits a session_end event. These run non-blocking after the tool call succeeds.

Python_POST_HOOKS: dict[str, str] = {
    "log_decision": "decision",
    "end_session": "session_end",
    "start_session": "session_start",
}

What MCP enables vs what it does not

Server can enforce	Server cannot enforce
Auto-session creation/teardown (no client cooperation needed)	Forcing the client to call specific tools
Universal pre/post interception of every tool call	Preventing the client from ignoring tool responses
Rejecting tool calls that violate preconditions	Guaranteeing the client reads `instructions`
Identifying the client (name/version from init handshake)	Cross-server coordination
Injecting context into every response	--

Tool Categories

State Tools (`tools/state.py`)

Project state management, roadmap tracking, session lifecycle.

Tool	Description
`list_projects`	List registered projects, optionally filtered by mode (work/life)
`get_project_state`	Current status: phase, summary, in-flight work, blockers
`update_project_state`	Update state; creates if none exists
`get_roadmap`	Ordered roadmap items, optionally filtered by status
`add_roadmap_item`	Add an item to a project's roadmap
`update_roadmap_item`	Update status/notes on a roadmap item
`start_session`	Begin a harness session -- loads last handoff + rules
`end_session`	End session -- persists decisions and handoff summary
`get_session`	Retrieve a specific session by ID
`list_sessions`	List recent sessions for a project
`get_decisions`	Get recent decisions for a project
`log_decision`	Record a decision with rationale

Spine Tools (`tools/spine.py`)

Rules engine, workflow management, prompt library.

Tool	Description
`get_rules`	Get rules matching a trigger context (e.g., "testing", "deployment"); supports concern filtering
`get_workflow`	Get a workflow by slug or by matching trigger context; supports concern filtering
`get_prompt`	Get a system prompt by slug or purpose
`add_rule`	Create or update a rule with triggers[] and conditions
`add_workflow`	Create or update a workflow with steps JSONB
`add_prompt`	Create or update a system prompt

Cortex Tools (`tools/cortex.py`)

Knowledge storage, semantic search, domain discovery.

Tool	Description
`list_domains`	List all knowledge domains with chunk counts
`search_knowledge`	Semantic search across knowledge chunks (uses pgvector VECTOR(1536))
`get_chunk`	Retrieve a specific knowledge chunk by ID
`add_knowledge`	Store a knowledge chunk with domain, tags, and optional embedding
`bulk_insert`	Batch insert multiple knowledge chunks

Learnings Tools (`tools/learnings.py`)

Accumulated insights with transferability scoring.

Tool	Description
`log_learning`	Record a learning with category, insight, context, and transferability_score
`search_learnings`	Search learnings by category, domain, or keyword
`get_transferable_learnings`	Get learnings above a transferability threshold (for cross-mesh flow)

Agent Tools (`tools/agents.py`)

Agent registry and capability management.

Tool	Description
`list_agents`	List registered agents with capabilities and status
`get_agent`	Get agent details including implementations and knowledge
`register_agent`	Register a new agent with type, capabilities, and model preference

Health Tools (`tools/health.py`)

Harness diagnostics and status.

Tool	Description
`harness_health`	Overall health check -- table counts, latest activity timestamps
`schema_info`	List all tables and their row counts in the current harness

Event Tools (`tools/events.py`)

Mesh event stream for observability.

Tool	Description
`emit_event`	Emit a mesh event (event_type, payload JSONB)
`list_events`	Query recent mesh events, optionally filtered by type

Transaction Tools (`tools/transactions.py`)

Cross-harness operation tracking.

Tool	Description
`start_transaction`	Begin a multi-step cross-harness transaction
`add_transaction_step`	Record a step in a running transaction
`complete_transaction`	Finalize a transaction with duration and status
`get_transaction`	Retrieve a transaction by ID
`list_transactions`	List recent transactions

Mesh Tools (`tools/mesh.py`)

Mesh topology and instance management.

Tool	Description
`list_harness_instances`	List registered harness instances with types and connection info
`get_mesh_topology`	Get the full mesh topology -- instances, connections, health
`register_instance`	Register a new harness instance in the mesh

Concern Tools (`tools/concerns.py`)

Cross-cutting concern queries.

Tool	Description
`query_by_concern`	Retrieve knowledge, rules, and workflows tagged with a specific concern
`tag_concern`	Add concern tags to existing knowledge items

Tracking Tools (`tools/tracking.py`)

Session and event observability -- query what happened across Claude Code and agent sessions.

Tool	Description
`get_claude_sessions`	List sessions, optionally filtered by project slug or status
`get_claude_session_detail`	Full detail for a specific session including event count
`get_claude_session_events`	Events for a session (tool calls, decisions, file writes)
`get_claude_activity_summary`	Aggregate stats: sessions, events, tool usage over a time range
`track_artifact`	Record an artifact produced during a session

Logging Interceptor (`tools/logging.py`)

Not a tool category -- this is the wrapper that makes auto-tracking work. Every tool handler is wrapped with log_tool_call() which provides:

Auto-session creation on first tool call per connection
Event recording for every tool invocation (params, duration, status, result count)
Tool call counting on the session row
Structured JSON logging to stderr for external observability

Guardrails (`tools/guardrails.py`)

Post-tool hooks that fire after specific tool calls. These automatically emit semantic events (e.g., log_decision triggers a decision event) without the agent needing to do anything extra.

Database Schema

The CNS schema is the core data model. Every harness instance uses the same tables.

Core Knowledge Tables

SQL-- Knowledge store (the cortex)
CREATE TABLE cortex_chunks (
    id         UUID PRIMARY KEY,
    domain     TEXT,
    content    TEXT,
    embedding  VECTOR(1536),
    tags       TEXT[],
    project_slug TEXT,
    chunk_type TEXT,
    concerns   TEXT[] DEFAULT '{}',
    created_at TIMESTAMPTZ
);

-- Rules engine (the spine)
CREATE TABLE spine_rules (
    id         UUID PRIMARY KEY,
    slug       TEXT UNIQUE,
    content    TEXT,
    triggers   TEXT[],
    project_slug TEXT,
    conditions JSONB,
    concerns   TEXT[] DEFAULT '{}',
    created_at TIMESTAMPTZ
);

-- Process workflows (nervous system)
CREATE TABLE spine_workflows (
    id         UUID PRIMARY KEY,
    slug       TEXT UNIQUE,
    steps      JSONB,
    triggers   TEXT[],
    project_slug TEXT,
    status     TEXT,
    concerns   TEXT[] DEFAULT '{}',
    created_at TIMESTAMPTZ
);

-- Accumulated insights (memory)
CREATE TABLE learnings (
    id                    UUID PRIMARY KEY,
    category              TEXT,
    insight               TEXT,
    context               JSONB,
    domain                TEXT,
    project_slug          TEXT,
    transferability_score NUMERIC(3,2),
    created_at            TIMESTAMPTZ
);

Mesh Observability Tables

SQL-- Event stream
CREATE TABLE mesh_events (
    id          UUID PRIMARY KEY,
    event_type  TEXT,
    harness_id  TEXT,
    payload     JSONB,
    created_at  TIMESTAMPTZ
);

-- Cross-harness operations
CREATE TABLE mesh_transactions (
    id                UUID PRIMARY KEY,
    steps             JSONB,
    total_duration_ms INTEGER,
    harness_ids       TEXT[],
    status            TEXT,
    created_at        TIMESTAMPTZ
);

Project and Session Tables

SQL-- Project registry
projects (id, slug, name, mode, status, ...)

-- Project state snapshots
project_states (id, project_slug, summary, phase, in_flight JSONB, blockers JSONB, ...)

-- Session lifecycle
sessions (id, project_slug, phase_id, input_tokens, output_tokens, cost, duration, output_lines JSONB, ...)

-- Decision log
decisions (id, project_slug, decision, rationale, context, ...)

-- Roadmap items
roadmap_items (id, project_slug, title, status, priority, ...)

Running the Server

Prerequisites

Python 3.11+
PostgreSQL with pgvector extension (Neon recommended)
A .env file with DATABASE_URL

Setup

Bash# Install dependencies
pip install -e .

# Create .env from example
cp .env.example .env
# Edit .env with your DATABASE_URL

# Run migrations (if starting fresh)
# The CNS schema tables are created via Neon branch-from-parent

Running

Bash# Start the MCP server (stdio transport)
python server.py

The server uses stdio transport -- it reads MCP messages from stdin and writes responses to stdout. It is designed to be spawned by an MCP client (Claude Code, a mesh manager, etc.), not run standalone.

Environment Variables

Variable	Required	Description
`DATABASE_URL`	Yes	PostgreSQL connection string (Neon branch URL)
`PROJECT_SLUG`	No	When set, scopes reads to this project (used by product harness instances)
`HARNESS_ID`	No	Identifier for this harness instance (used in mesh events)

Connecting an Agent to the Harness

How you connect depends on whether your harness is file-based (Scale 1) or MCP-based (Scale 2).

Scale 1: File-based harness

Point the agent's instruction file at the harness folder. The agent reads these files at session start and follows the rules inside them.

CLAUDE.md# Point to harness rules
@.claude/rules/coding-standards.md
@.claude/rules/testing.md
@.claude/rules/architecture.md

# Point to domain data
Domain data lives in domain/ — read before making changes.
Decisions are logged in docs/decisions/ — check before proposing alternatives.

For Cursor, use .cursorrules or .cursor/rules/. For Copilot, use .github/copilot-instructions.md. The pattern is the same — a file the agent reads automatically.

Scale 2: MCP-based harness — per-project

Add MCP server config to your project. Claude Code reads .mcp.json from the project root:

.mcp.json (in project root){
  "mcpServers": {
    "harness": {
      "command": "python",
      "args": ["/path/to/harness-os-mcp/server.py"],
      "env": {
        "DATABASE_URL": "env:HARNESS_DB_URL",
        "PROJECT_SLUG": "my-project"
      }
    }
  }
}

Each project gets its own .mcp.json with the PROJECT_SLUG that scopes rules, knowledge, and sessions to that project.

Scale 2: MCP-based harness — machine-wide

Install the harness once on the machine and every project gets it automatically. No per-project config needed.

~/.claude/settings.json (machine-level){
  "mcpServers": {
    "build-harness": {
      "command": "python",
      "args": ["/opt/harness-os-mcp/server.py"],
      "env": {
        "DATABASE_URL": "env:BUILD_HARNESS_DB",
        "HARNESS_ID": "build-harness"
      }
    }
  }
}

Machine-wide config means any Claude Code session on this machine connects to the build harness — no setup per project. This is the recommended approach when one harness serves all your projects (e.g., a build harness with shared coding standards).

Per-project when each project has its own harness instance (e.g., product harness with project-specific roadmap). Machine-wide when the harness is shared (e.g., build harness with coding standards, or a personal harness that spans all work). You can combine both — machine-wide for shared harnesses, per-project for project-specific ones.

Environment variables

Use env: references instead of plaintext credentials. The MCP client reads the actual value from the user's shell environment:

JSON// env: prefix → resolved from shell environment at spawn time
"DATABASE_URL": "env:BUILD_HARNESS_DB"

// Set in ~/.zshrc or ~/.bashrc
// export BUILD_HARNESS_DB="postgresql://..."

Connection Management

In a mesh with multiple harness instances, a mesh manager (e.g., harness-mesh.ts) spawns Python MCP server processes on demand:

Lazy connect: Instance process spawned on first access
30s timeout: Connection timeout for unresponsive instances
10min idle eviction: Unused connections cleaned up
Stale retry: Dead client evicted, reconnect attempted once
Graceful shutdown: Clean shutdown on SIGTERM/SIGINT

These are config choices -- a different config might use persistent connections or different timeouts.

Testing

Bash# Run tests
pytest tests/

# Tests use a dedicated Neon 'test' branch

MCP Server Reference

What works without MCP

What does NOT work without MCP

Option A: Harness inside the project

Option B: Harness on the machine (shared across projects)

Combining both

1. Server-side session IDs

2. Tool call interceptor (tools/logging.py)

3. Connection lifecycle (session end)

4. Post-tool guardrails (tools/guardrails.py)

Scale 1: File-based harness

Scale 2: MCP-based harness — per-project

Scale 2: MCP-based harness — machine-wide

Environment variables

2. Tool call interceptor (`tools/logging.py`)

4. Post-tool guardrails (`tools/guardrails.py`)