The Three Layers
harness.os is not one thing. It is three things stacked on top of each other: a methodology (universal principles for structuring AI knowledge), a config (a specific application of those principles), and a mesh (a running instance of connected harnesses following a config). Understanding these three layers is the key to understanding everything else.
The Methodology
Universal principles for structuring AI knowledge. Four harness types, a standard schema, a session lifecycle, and a protocol for knowledge to flow between contexts. Anyone can apply these principles to build anything.
The Config
A specific application of the methodology. Marco's config includes an 8-phase dev workflow, TDD, hexagonal architecture, Neon PostgreSQL, and 6 apps across fitness, skydiving, finance, and SaaS. Configs are portable and forkable.
The Mesh
A running instance of connected harnesses. Marco's mesh: 6 apps, 18 harness instances, 10 Neon branches, all talking through MCP. A cortex.ai tenant gets their own mesh. One config can spawn many meshes.
Technical deep dive: The three layers in practice
1. harness.os = The Methodology
harness.os is a set of universal principles for structuring AI knowledge. It defines a type system of four harness types (build, product, operations, domain), an internal schema for each harness (knowledge tables, learning tables, rules, workflows), a session lifecycle model, and a protocol for flowing knowledge between contexts through a mesh. The CNS schema -- how tables relate, how slugs scope data, how rules fire through triggers -- is the core intellectual property. This layer exists independently of any specific app or implementation.
The methodology is an interface contract. Any implementation that follows the harness type system, uses the CNS schema (cortex_chunks, spine_rules, spine_workflows, learnings), and respects the session lifecycle is a valid harness.os implementation. The MCP tools, the Neon branches, the Python server -- those are implementation details of one config. The methodology is the specification.
2. Harness Config = A Specific Application
A harness config is a concrete set of choices that apply the methodology. Marco's config includes: an 8-phase development workflow, TDD and hexagonal architecture as build standards, FF-gated prototypes, Neon PostgreSQL as the data layer, MCP as the communication protocol, and specific operations harnesses for skydiving, fitness, and finance. Configs are portable and forkable -- someone else could use the same methodology with entirely different choices (different tech stack, different domains, different workflow).
3. Mesh = A Running Instance
A mesh is a running instance of connected harnesses following a config. It has the class-to-instance relationship with config that objects have with classes. Marco's personal mesh connects 6 apps (build.ai, marco.ai, cortex.ai, way2fly, way2move, way2save) through 18 harness instances across 10 Neon branches. A cortex.ai Lake Deck mesh would be a different instance: hospitality processes connected through their harnesses. One config can spawn multiple meshes. Apps are on a mesh, not the mesh.
harness.os (methodology) defines the abstract class. marco-config (config) is a concrete class with specific field values. marco-mesh (mesh) is a running instance of that class. lake-deck-mesh would be another instance of a cortex-hospitality-config. Config defines what harnesses exist and how they connect. Mesh is the actual running system with data flowing through it.
Per-user mesh instances
A mesh instance is per user. Each user gets a mesh that connects only the harnesses relevant to their subscriptions. Consider a way2do consumer (way2do is the consumer hub that bundles way2fly, way2move, and way2save):
- A user subscribed to way2save + way2fly gets a mesh instance connecting their finance domain harness to their skydive domain harness. Their agent can answer "Can I afford this skydive camp?" because the mesh links budget data to skydive scheduling -- for that specific user.
- They don't see way2move. Their mesh has no fitness harness. The cross-domain reasoning only spans the harnesses in their mesh.
- If they later add way2move, their mesh instance expands -- now the agent can factor training schedule into the camp decision.
Marco's mesh is the superset: all 6 apps, all 18 harness instances. Each consumer gets a subset. The mesh instance sits inside the configuration (which apps connect), which sits inside the methodology (how harnesses connect).
Data architecture deep dive: Schema, topology, and state
The three-layer model maps directly to a data architecture:
1. Methodology = Schema Definition
The harness.os methodology defines the schema contract: every harness instance, regardless of type or purpose, uses the same core tables -- cortex_chunks (knowledge + VECTOR(1536) embeddings), spine_rules (triggers[] + conditions), spine_workflows (steps JSONB), learnings (accumulated insights). This universal schema is what makes harnesses composable -- any MCP tool works against any harness instance because the tables are identical.
2. Config = Branch Topology + Isolation Strategy
Marco's config specifies how the schema maps to physical databases: 10 Neon branches, slug-filtered vs. branch-isolated harnesses, which instances share branches (product-shared) vs. which get dedicated branches (skydive-harness). A different config might use MySQL, or DynamoDB, or even flat files -- as long as the harness schema contract is honored.
3. Mesh = Live Data Topology
The mesh is the running data system: 18 harness instances connected via MCP, with queries flowing between branches, learnings accumulating in each instance, and mesh_transactions tracking cross-harness operations. The mesh is where data lives and flows. A different mesh (e.g., Lake Deck) would have different data but the same schema and query patterns.
The three layers create a clean separation: schema (methodology, universal), topology (config, per-deployment), and state (mesh, per-instance). This means you can reason about query patterns at the methodology level, optimize storage at the config level, and monitor performance at the mesh level -- independently.
Strategic deep dive: Three levels of business value
The three layers represent three levels of business value:
1. Methodology = The Playbook (Intellectual Property)
The harness.os methodology is the core IP. It defines how to organize AI knowledge into four composable layers that anyone can apply to build any product. This is what you teach, license, or open-source. It is technology-agnostic and domain-agnostic.
2. Config = Your Strategy (Specific Choices)
Marco's config is a specific business strategy expressed through the methodology: 6 apps across fitness, movement, finance, personal management, B2B SaaS, and developer tools. A different founder applying the same methodology would build a different config for a different market. Configs are the "business plan" layer -- portable and forkable.
3. Mesh = Your Running Business (Live System)
The mesh is the actual deployed system with real users, real data, and real revenue. Marco's mesh serves his products. Lake Deck's mesh (a cortex.ai tenant) serves their hospitality operations. Each mesh is independent but can share learnings with other meshes through the methodology's knowledge flow patterns.
The methodology creates products (configs). The products create partnerships (meshes). The partnerships create ecosystems. A cortex.ai tenant is someone applying a subset of the methodology (operations + domain config) to their own mesh. They don't need to understand build harnesses or product harnesses -- they only see the config relevant to their use case.
Think of it like cooking. The methodology is the science of cooking itself -- how flavors combine, how heat changes ingredients, what makes a dish balanced. The config is a specific cookbook -- Italian, Japanese, or fusion -- that applies those principles to create recipes. The mesh is an actual restaurant using that cookbook, with real kitchens, real chefs, and real diners.
The Three Layers, Simply
The Methodology = The Playbook
Universal rules for organizing AI knowledge. "Knowledge should be separated into four types. Types should be composable. Knowledge should flow between connected systems." This works for any industry, any technology, any product.
The Config = Your Version
Marco's specific choices: 6 apps (skydiving, fitness, finance, personal, business, developer tools), an 8-step development process, and specific tools (Neon database, MCP protocol). Someone else would make different choices for their industry.
The Mesh = The Running System
The actual live system with all the apps connected and talking to each other. When you ask "Can I afford skydive camp?", it's the mesh that routes the question to the right apps and combines the answers.
The power of this separation: anyone can take the playbook, write their own version, and run their own system. A hotel chain could take the same playbook and create a hospitality version with staff training, compliance, and scheduling -- a completely different system, built on the same principles.
And it works at any budget. A student can start with just text files in their project folder (free). A freelancer can add a database ($20/month). A team can go remote ($200/month). An enterprise can run a federated mesh ($2K+/month). The playbook is the same at every level -- only the tools change.
The Methodology
harness.os is not a product. It is a set of principles for organizing AI knowledge that anyone can apply at any scale. The principles: knowledge should be typed, structured with defined internal organization, and should flow between contexts through a mesh. These principles are universal -- they work for skydiving, for hospitality, for manufacturing, for any domain. And they work at every scale -- from a solo developer with CLAUDE.md files ($0) to a federated enterprise mesh ($2K+/mo). The methodology stays the same. Only the implementation changes.
Technical deep dive: The five core principles
The harness.os methodology defines five core principles that any implementation must follow:
Principle 1: Typed Harnesses
All knowledge falls into exactly four types: build (HOW to create), product (WHY + WHAT — discovery, validation, specification, measurement), operations (HOW the domain works), and domain (WHO — per-user data). Every harness instance has exactly one base type. Products are compositions of these types.
At any scale: A solo dev with files has one build harness (CLAUDE.md) and one product harness (docs/). A team has a shared build harness (coding standards DB) + per-project product harnesses. An enterprise adds operations harnesses per department and domain harnesses per user. The four types hold at every tier.
Principle 2: Internal Structure (The CNS Schema)
Every harness instance, regardless of type, contains the same internal structure:
| Table | Purpose | Key Columns |
|---|---|---|
cortex_chunks | Knowledge store | domain, content, embedding VECTOR(1536), tags[], project_slug |
spine_rules | Rules engine | slug, content, triggers[], project_slug, conditions |
spine_workflows | Process workflows | slug, steps JSONB, triggers[], project_slug |
learnings | Accumulated insights | category, insight, context, domain, transferability_score |
The "CNS" metaphor: cortex_chunks is the brain (knowledge storage), spine_rules is the spine (structural rules that trigger actions), spine_workflows is the nervous system (multi-step processes), and learnings is the memory (accumulated experience).
Principle 3: Session Lifecycle
Every interaction with a harness follows a lifecycle: start_session() loads the handoff from the previous session plus accumulated rules, the agent works within the harness context, and end_session() persists decisions and learnings back to the harness. This lifecycle is what makes knowledge compound over time.
At any scale: At Tier 1 (files), the "session lifecycle" is reading CLAUDE.md at start and updating docs before you close the editor. At Tier 2 (DB), it's explicit start_session()/end_session() MCP calls. At Tier 4 (enterprise), it's automated with audit trails. Same principle, different implementation.
Principle 4: Slug Scoping
The project_slug field on every table is the scoping mechanism. Multiple harness instances can share a physical database by filtering on slug. This enables both branch-level isolation (each instance gets its own database) and slug-level isolation (instances share a database but see only their own data). The methodology supports both -- the config decides which to use.
At any scale: At Tier 1, scoping is folder structure (project-a/CLAUDE.md vs project-b/CLAUDE.md). At Tier 2, it's slug-filtered rows in shared tables. At Tier 3+, it can be separate databases per tenant. The scoping principle adapts to the storage layer.
Principle 5: Mesh Communication
Harnesses communicate through a defined protocol (in the current config, MCP). Cross-harness queries are logged in mesh_transactions with step tracking. Learnings with high transferability_score can flow between harnesses. The mesh protocol is what makes cross-domain reasoning possible.
At any scale: At Tier 1, the "mesh" is a developer manually copying a learning from one project's docs to another. At Tier 2, it's MCP servers connecting harness databases locally. At Tier 3, it's remote MCP with auth. At Tier 4, it's federated mesh with cross-organization knowledge flow. The principle of knowledge flowing between contexts is the same -- the mechanism scales.
These five principles are implementation-agnostic. You could implement harness.os with PostgreSQL + MCP (as the current config does), or with MongoDB + REST APIs, or with DynamoDB + gRPC. The methodology defines the contract. The config implements it. The mesh runs it.
Data deep dive: The universal schema contract
The methodology defines a data contract that every harness implementation must honor:
The Universal Schema
CNS Schema Contract
-- Every harness instance has these tables:
cortex_chunks (knowledge + embeddings)
domain TEXT, content TEXT, embedding VECTOR(1536),
tags TEXT[], project_slug TEXT, chunk_type TEXT
spine_rules (trigger-based rules)
slug TEXT UNIQUE, content TEXT, triggers TEXT[],
project_slug TEXT, conditions JSONB
spine_workflows (multi-step procedures)
slug TEXT UNIQUE, steps JSONB, triggers TEXT[],
project_slug TEXT, status TEXT
learnings (accumulated insights)
category TEXT, insight TEXT, context JSONB,
domain TEXT, project_slug TEXT,
transferability_score NUMERIC(3,2)
-- Mesh observability:
mesh_events (event_type, harness_id, payload JSONB)
mesh_transactions (steps JSONB, total_duration_ms)
Scoping Patterns
The methodology defines two isolation strategies that configs can mix:
- Branch isolation: Each harness instance gets its own physical database/branch. Maximum isolation. Higher infrastructure cost.
- Slug isolation: Multiple harness instances share a database, scoped by
WHERE project_slug = $N. Lower cost. Application-layer isolation.
When a new harness instance is provisioned (e.g., Neon branch-from-parent), it inherits the full schema automatically. Zero DDL needed. The schema IS the methodology -- tables are identical everywhere, and that's what makes any MCP tool work against any harness instance. This is intentional: the data contract is the interface contract.
Strategic deep dive: The IP and licensing model
The methodology is the intellectual property layer -- the principles that make the entire platform work. Here is what it defines:
Four Knowledge Types
All organizational knowledge falls into exactly four categories: how to create (build), what to build (product), how to run domain operations (operations), and actual user data (domain). This is universal -- it works for software companies, hotels, factories, or any other organization.
Composability
Any combination of the four types creates a valid product. Build + Product = a developer platform. Operations + Domain = a B2B SaaS tool. All four = a full operating system. This means new products are new combinations, not new platforms.
Compound Learning
Every interaction with the system generates learnings that persist in the harness. New agents, new products, and new users inherit accumulated knowledge. The system gets smarter with every use -- and learnings that generalize (high transferability) flow across domains automatically.
The methodology is what you teach, license, or open-source. A partner applying the methodology to their industry creates their own config and mesh -- but the principles are shared. This is the franchise model: the methodology is the franchise playbook, configs are franchise locations, and meshes are the running businesses.
The methodology is like the rules of language. Every language has nouns, verbs, adjectives, and sentences. Those rules are universal -- they work for English, Spanish, Mandarin, or any language. The harness.os methodology works the same way: four types of knowledge, rules for how they connect, and a way for them to learn over time. The specific language you speak (your config) is up to you.
The Five Rules
- Sort knowledge into four types: Recipes (how to create things), blueprints (what to build), playbooks (how to run domain operations), and records (your data).
- Give each type its own filing cabinet: Keep them separate so they don't get mixed up, but organized so they can be found quickly.
- Start each conversation with context: When you talk to the system, it loads what it learned from last time. It never starts from scratch.
- Label everything: Each piece of knowledge has a tag that says which project, which domain, and which type it belongs to. This is how the system knows what to show whom.
- Let knowledge flow: When something learned in one app is useful in another, it flows automatically. A training insight from your fitness app can help a hotel's staff training system.
Harness Types
The methodology defines four knowledge categories. Your config decides how many instances of each type exist. Your mesh connects them for a specific user. All four types talk to each other through the mesh — build knowledge flows into products, products reference operations, operations govern domain data.
Architecture deep dive: Four harness types as components
The methodology defines four harness types. Each type has the same internal schema (CNS tables) but different content and access patterns. In any config, each type maps to one or more harness instances.
Build Harness
Creation knowledge — not just software. Covers software development (dev workflow, CI/CD, architecture patterns, testing strategies) AND content creation (blog, marketing, design systems). Shared across all projects in a mesh. Through the mesh, build harness knowledge flows into every product: the same coding standards apply whether you're building a finance app or a skydiving app.
Tier 1: CLAUDE.md + .claude/rules/ | Tier 2+: cortex_chunks + spine_rules in build-harness DB
Product Harness
The full product lifecycle: discovery (validate ideas, hypotheses, experiments, A/B tests), specification (architecture decisions, roadmaps, specs, feature definitions), and measurement (metrics, impact tracking, user feedback). Each product in the mesh gets its own product harness (w2f-product, w2s-product, etc.) with a lifecycle phase: discovery → building → maintaining → archived. Through the mesh, a product harness pulls build knowledge (how to create) and pushes requirements to operations harnesses (how to run).
Tier 1: docs/ folder per project | Tier 2+: per-product cortex_chunks + spine_workflows
Operations Harness
Domain-specific operational knowledge: skydive progression rules, fitness programming, finance budgets, hospitality procedures. The only type that changes completely by industry. Through the mesh, operations harnesses connect to domain harnesses (the data they operate on) AND to build harnesses (the tools that maintain them) AND to product harnesses (the apps that expose them).
Tier 1: domain-specific docs | Tier 2+: spine_rules with triggers[] + spine_workflows
Domain Harness
User-scoped transactional data: jump logs, workout records, financial transactions. Scoped by user_id. Through the mesh, domain harnesses connect to operations harnesses (the rules that govern them) and to other domain harnesses — enabling cross-domain queries like "Can I afford this skydive camp?" which crosses finance domain ↔ skydive domain for a specific user's mesh instance.
Tier 1: app data (Firebase, local DB) | Tier 2+: fed_* tables + learnings per user
The mesh is not a layer between two types — it connects all four. marco.ai talks to build harness (for dev standards), product harnesses (for app specs), operations harnesses (for life management rules), and domain harnesses (for personal data). A way2do consumer with way2save + way2fly gets a mesh connecting just those products' operations and domain harnesses. The type system defines what knowledge is. The mesh defines how it flows. The config decides which types exist in your mesh. The methodology makes all of this work at any scale.
Data deep dive: Per-type data topology
Each harness type has the same schema but different data access patterns:
| Type | Primary Table Use | Typical Query Pattern | Data Volume |
|---|---|---|---|
| Build | spine_rules (coding standards), cortex_chunks (patterns) | Read-heavy: agents query patterns before generating code | Low (hundreds of rows) |
| Product | cortex_chunks (specs, ADRs, hypotheses), spine_workflows (discovery + delivery phases), learnings (validation results) | Read-write: agents query specs, log experiment results, track metrics | Low-medium per product (grows with validation data) |
| Operations | spine_rules (triggers[]), spine_workflows (operational procedures) | Trigger-driven: rules fire based on events (session_complete, threshold_met) | Medium (rules + accumulated learnings) |
| Domain | fed_* tables (user data), learnings (personal insights) | OLTP: reads and writes of user data scoped by user_id | High (grows with user activity) |
The methodology recommends: build and product harnesses tolerate slug-based isolation (shared branches). Operations harnesses benefit from branch isolation (complex trigger logic, domain-specific rules). Domain harnesses require the strongest isolation (user data, privacy). The config decides where on this spectrum each instance falls.
Strategic deep dive: Four types as product building blocks
Understanding the four types is the key to understanding every product in the ecosystem -- regardless of config.
Build Harness
Institutional memory of how software gets built. Shared across every engineering project in any config that includes software development.
Product Harness
The full product lifecycle: discovery and validation (why to build), specification (what to build), and measurement (did it work). Each product has a lifecycle phase: discovery → building → maintaining → archived.
Operations Harness
Domain-specific operational knowledge. The brain that makes each domain smart about its specific operations. This is the type that cortex.ai packages for B2B.
Domain Harness
Per-user, per-app operational data. Each user's experience is personal because their domain harness contains their specific data.
The four-type model means any product can be defined as a composition of types. Build + Product = developer platform. Operations + Domain = B2B SaaS. All four = full operating system. New products don't require new platforms -- they require new compositions. This is true in any config, not just Marco's.
Everything in the system is built from four types of knowledge:
Recipes
How to make things. Like a cookbook that grows over time -- every time a software project is built, the recipes get better. "Use this pattern for user interfaces." "Test this way."
Blueprints
WHY and WHAT for each app. Like a product lab: first you test if an idea works (discovery), then you draw the plans (specification), then you track if it's working (measurement). Each app goes through phases: discovery → building → maintaining → archived.
Playbooks
How to run each domain. A skydiving playbook knows progression rules and safety checklists. A fitness playbook knows exercise programming. A finance playbook knows budget categories.
Records
Your actual data. Jump logs, workout history, bank transactions. This is personal -- it belongs to you and is what makes the apps useful for your specific life.
Each type of knowledge is stored separately and securely, but the system can look across all of them when you ask a question. That's the magic -- one question can pull answers from multiple types of knowledge at once.
Type System: Cross-Cutting Concerns
The four harness types (build, product, operations, domain) have held up across 18 harness instances and 6 apps. Stress-testing them against knowledge management frameworks (Porter, TOGAF, Intellectual Capital, Zack's taxonomy) revealed knowledge that doesn't fit neatly into one type. The resolution: five cross-cutting concerns that span all four types. These are not new types -- they are lenses that apply everywhere, implemented as a concerns TEXT[] column on knowledge tables.
What fits cleanly today
| Knowledge Example | Type | Why It Fits |
|---|---|---|
| TDD workflow, CI/CD pipeline, content creation process | Build | How to create things — recipes for making digital things |
| App roadmap, feature specs, architecture decisions | Product | What to build — blueprints for each product |
| Skydive progression rules, hospitality onboarding, fitness programming | Operations | How to run domain ops — playbooks for each industry |
| User jump logs, workout records, financial transactions | Domain | Per-user data — personal records that make apps useful |
The five cross-cutting concerns
Each concern cuts across all four harness types. A single knowledge chunk, rule, or workflow can be tagged with one or more concerns via a concerns TEXT[] column on cortex_chunks, spine_rules, and spine_workflows. Agents query by concern to assemble cross-cutting context regardless of which harness type owns the data.
Relational / Ecosystem
Stakeholders, partners, dependencies, network knowledge. Who are the upstream and downstream entities? How do relationships affect outcomes? Partner ecosystem health, supplier networks, community dynamics, coach-student relationships, DZ partnerships, cortex.ai tenant relationships.
Spans: Build (open-source dependencies), Product (partner integrations), Operations (vendor relationships), Domain (user relationship data).
Governance
Compliance, policies, access control, audit trails. Who approves what? What requires review? What's regulatorily mandated? Governance governs all four types -- it is a layer above, not alongside them.
Spans: Build (CI/CD approval flows), Product (decision authority), Operations (compliance procedures), Domain (data access policies).
Causal / WHY
Root cause analysis, decision rationale, failure chains. While Product now includes discovery and validation (the WHY of what to build), this concern captures deeper causal reasoning across ALL types. Why a market shifted, why a regulation exists, why users churn, why an architecture was chosen, why a process failed.
Spans: Build (architecture decision rationale), Product (strategic reasoning), Operations (failure post-mortems), Domain (user behavior patterns).
Metacognitive
Learning-about-learning, process effectiveness, knowledge quality. How the system improves its own knowledge practices. Which curation approaches work? Which learnings transfer well? The transferability_score is already a metacognitive signal.
Spans: Build (which dev practices improve quality), Product (which specs lead to good outcomes), Operations (which rules fire effectively), Domain (which data improves predictions).
Security / Risk
Threat models, vulnerability tracking, risk assessments. Applies to all four types equally -- secure Build practices, Product security requirements, Operations compliance, Domain data protection.
Spans: Build (secure coding rules), Product (security architecture), Operations (compliance risk), Domain (data protection, PII handling).
Cross-cutting concerns are implemented as a concerns TEXT[] column on cortex_chunks, spine_rules, and spine_workflows. A chunk tagged concerns = ['governance', 'security'] is discoverable by any agent querying either concern, regardless of which harness type stores it. This avoids the combinatorial explosion of creating separate harness types for each concern while ensuring nothing falls through the cracks.
Schema Addition
-- Added to existing CNS knowledge tables:
ALTER TABLE cortex_chunks ADD COLUMN concerns TEXT[] DEFAULT '{}';
ALTER TABLE spine_rules ADD COLUMN concerns TEXT[] DEFAULT '{}';
ALTER TABLE spine_workflows ADD COLUMN concerns TEXT[] DEFAULT '{}';
-- Query by concern (cross-type):
SELECT * FROM cortex_chunks WHERE 'governance' = ANY(concerns);
-- Tag a chunk with multiple concerns:
INSERT INTO cortex_chunks (domain, content, concerns)
VALUES ('compliance', 'All document changes require lawyer review...',
ARRAY['governance', 'security']);
Cross-cutting concerns structure knowledge not just by type but by concern, so AI agents can reason across dimensions. The four harness types organize knowledge by what it is (creation, specification, operation, data). The five concerns organize knowledge by what it addresses (relationships, governance, causation, learning, risk). Together, they form a two-dimensional knowledge classification that gives agents richer context for cross-domain reasoning.
When cross-cutting concerns matter most
| Scenario | Concern | How the agent uses it |
|---|---|---|
| Apply at legal tech company | Governance | Agent queries WHERE 'governance' = ANY(concerns) to assemble compliance context across all harness types before generating legal documents. |
| cortex.ai grows past 5 tenants | Relational/Ecosystem | Agent queries ecosystem concern to understand tenant relationships, partner networks, and cross-tenant patterns when making provisioning decisions. |
| Second person uses the methodology | Metacognitive | Agent queries metacognitive concern to surface knowledge curation best practices and which learning approaches have been most effective. |
| Compliance audit of any product | Security/Risk + Governance | Agent queries both concerns simultaneously to assemble a cross-type audit trail: who approved what, what security controls are in place. |
| Strategic decisions need justification | Causal/WHY | Agent queries causal concern to retrieve decision rationale, root cause analyses, and reasoning chains beyond what the Product spec captures. |
The verdict
The four types are correct at this level of abstraction. They've held across 18 instances and 6 apps without needing a fifth type. The five cross-cutting concerns handle what initially looked like gaps: governance and relational/ecosystem knowledge are not new types but concerns that span all types. Tagging knowledge with concerns TEXT[] gives agents a second dimension for retrieval without complicating the type system.
This two-dimensional model (4 types x 5 concerns) organizes knowledge so AI agents can reason effectively across both type boundaries and concern boundaries. The type tells you where knowledge lives. The concern tells you what dimension it addresses.
What is a Harness Config
The methodology is universal. A config is what makes it yours. It is the set of concrete choices -- which tech stack, which domains, which workflows, which products -- that turn the abstract methodology into a real system. Configs are portable and forkable: someone else could take the same methodology and build an entirely different config for an entirely different industry.
Technical deep dive: Config as implementation spec
A harness config specifies:
- Data layer: Which database system, branch/isolation strategy, connection management.
- Communication protocol: How harness instances talk to each other (MCP, REST, gRPC, etc.).
- Build harness contents: Which development workflow, testing standards, architecture patterns.
- Product harnesses: Which products, per-product specs, roadmaps.
- Operations harnesses: Which domains, domain-specific rules and workflows.
- Domain harnesses: Per-user data schema, data ownership model.
- Mesh topology: How harness instances connect, which products compose which types.
A config is like a docker-compose.yml for knowledge architecture: it defines all the services (harness instances), their connections, and their configuration. The methodology is the base image; the config is the composition file.
Because the methodology defines the schema contract, configs are portable. You could export a operations harness from one config (e.g., a skydive progression harness) and import it into another config that needs it. The tables are the same; only the content differs. This is what makes cortex.ai possible -- each tenant effectively runs their own mini-config within the broader platform config.
Data deep dive: Branch strategy and isolation
A config maps the abstract methodology to a concrete data topology:
| Methodology Concept | Config Decides | Example (Marco's Config) |
|---|---|---|
| Data layer | Database system, branching model | Neon PostgreSQL, 10 branches, copy-on-write |
| Knowledge tables | Vector dimension, index type | VECTOR(1536), pgvector HNSW index |
| Instance isolation | Branch vs. slug per type | Operations: branch-isolated. Product: slug-filtered on shared branch |
| Mesh communication | Protocol, connection management | MCP (Python), lazy connect, 30s timeout, 10min idle eviction |
| Observability | Event schema, metrics pipeline | mesh_events + mesh_transactions tables |
The config IS the data architecture document. It specifies branch topology, isolation strategy, connection patterns, and scaling model. A data engineer joining the team reads the config to understand the entire data layout. A new config for a different use case would specify a different topology but the same schema -- ensuring tooling compatibility.
Strategic deep dive: Config as business strategy
A config translates the methodology into a business strategy:
Build Config
Which development practices to encode. TDD? Hexagonal architecture? 8-phase workflow? These are choices that shape how all products in the config get built.
Product Config
Which products to build and how they compose harness types. This is the product portfolio strategy expressed as a composition matrix.
Operations Config
Which operational domains to support. Skydiving? Hospitality? Manufacturing? Each domain gets its own operations harness with domain-specific rules.
Domain Config
Per-user data model. What data each app collects, how it's stored, and how it flows between products through the mesh.
Configs are forkable. A partner could fork Marco's config, remove the domains irrelevant to them (skydiving, fitness), add their own domains (healthcare, logistics), and have a working platform in weeks instead of months. The methodology stays the same; the config adapts to the business.
The methodology is the rulebook. The config is your game plan. Two football teams play by the same rules (the methodology), but each team has its own playbook (the config) -- different formations, different strategies, different strengths. Marco's config is his playbook: 6 apps, skydiving + fitness + finance domains, specific development practices. Someone else would write a completely different playbook for their business.
The important thing about a config is that it's shareable and adaptable. If someone likes how Marco organized his fitness domain, they could copy that part and adapt it for their own use -- while keeping everything else different. It's like sharing a recipe from one cookbook to another.
Marco's Config
This section shows one specific application of the methodology. Marco's config includes: an 8-phase development workflow, TDD and hexagonal architecture, FF-gated prototypes, 6 apps across 5 domains, and Neon PostgreSQL as the data layer. Every choice here is a config decision -- someone else applying harness.os would make different choices for their context.
Technical deep dive: Architecture choices and stack
Build Harness Config
Marco's build harness encodes specific development practices:
- 8-phase dev workflow: Discovery, Design, Specs, Domain, UI Wiring, Adapters, E2E, Deploy
- Model routing: Opus for Discovery/Specs/Planning, Sonnet for Domain/Wiring/Adapters/E2E, Haiku for Deploy/verification
- TDD-first: Write failing test first, then implementation
- Hexagonal architecture: domain/ and ports/ must have zero framework imports
- FF-gated prototypes: New features behind feature flags until validated
Instances: build-harness (slug-filtered: build + soft-eng + content), marco-builder (dedicated branch)
Product Harness Config
6 product harnesses, all sharing one Neon branch (product-shared), isolated by project_slug:
Instances: w2f-product, w2m-product, w2s-product, cortex-product, marco-ai-product, build-ai-product
Operations Harness Config
5 operations harnesses, each on its own dedicated Neon branch:
Branches: skydive-harness, fitness-harness, finance-harness, life-management, marco-personal
Domain Harness Config
Currently federated in main DB via fed_* tables. 5 domain harnesses planned, migrating to per-app Neon branches.
Current tables: fed_jumps, fed_activities, fed_transactions, fed_routines, fed_books
Infrastructure Config
Neon Branch Map (Marco's Config)
main -------------------------------- Platform DB (37 tables, request pipeline)
+-- build-harness -------------- Build + soft-eng + content (slug-filtered)
+-- marco-builder -------------- Personal dev workflow (own branch)
+-- marco-personal ------------- Cross-domain hub (own branch)
+-- skydive-harness ------------ Skydive operations rules (own branch)
+-- fitness-harness ------------ Training operations rules (own branch)
+-- finance-harness ------------ Finance operations rules (own branch)
+-- life-management ------------ Life routines/goals (own branch)
+-- product-shared ------------- All 6 product harnesses (slug-filtered)
+-- test ----------------------- Server integration tests
Each harness instance is registered in harness_instances with database_url, base_type, and optional tool_filter. The mesh manager (harness-mesh.ts) spawns Python MCP server processes on demand: lazy connect on first access, 30s timeout, 10min idle eviction, stale retry (evicts dead client, reconnects once), graceful shutdown on SIGTERM/SIGINT. These are config choices -- a different config might use persistent connections or a different timeout strategy.
Data deep dive: Branch topology and storage
Branch Budget (10/10 Free Tier)
Neon Branch Map
main -------------------------------- Platform DB (37 tables, request pipeline)
+-- build-harness -------------- Build + soft-eng + content (slug-filtered)
+-- marco-builder -------------- Personal dev workflow (own branch)
+-- marco-personal ------------- Cross-domain hub (own branch)
+-- skydive-harness ------------ Skydive operations rules (own branch)
+-- fitness-harness ------------ Training operations rules (own branch)
+-- finance-harness ------------ Finance operations rules (own branch)
+-- life-management ------------ Life routines/goals (own branch)
+-- product-shared ------------- All 6 product harnesses (slug-filtered)
+-- test ----------------------- Server integration tests
Per-Type Topology in This Config
| Type | Instances | Branch Strategy | Isolation |
|---|---|---|---|
| Build (2) | build-harness, marco-builder | Dedicated branches | Slug-filtered / dedicated |
| Product (6) | w2f, w2m, w2s, cortex, marco-ai, build-ai | Shared branch (product-shared) | Slug-filtered: WHERE project_slug = $N |
| Operations (5) | skydive, fitness, finance, life-mgmt, marco-personal | 5 dedicated branches | Full branch isolation |
| Domain (5 planned) | w2f-domain, w2m-domain, w2s-domain, cortex per-tenant, marco-domain | Planned (currently fed_* in main) | Per-app branches (planned) |
Free tier is maxed out. Future domain harnesses (w2f-domain, w2m-domain, w2s-domain, per-tenant cortex branches) require upgrading to Neon Pro. The test branch could be freed if CI migrates, recovering 1 slot. This is a config constraint, not a methodology constraint -- a different config could use a different database with no branch limits.
Strategic deep dive: The product portfolio
Marco's config creates a 6-app ecosystem from the four harness types. Each app is a specific composition:
Build Config
8-phase dev workflow, TDD, hexagonal architecture, FF-gated prototypes. These are Marco's specific engineering choices -- encoded in the build harness so every project follows them automatically.
Operations Config
5 domains: skydiving (progression, safety), fitness (training programming), finance (budgets, transactions), life management (routines, goals), and personal (cross-domain hub).
The config IS the business strategy. Marco chose to build across fitness, finance, and skydiving because those are his domains of expertise. A different founder might configure the same methodology around healthcare, logistics, and education. The methodology doesn't care which domains you choose -- it provides the structure for any combination.
Marco's version of the harness.os playbook includes:
How he builds
An 8-step development process, writing tests before code, and protecting new features behind switches until they're ready. These are his recipes.
What he's building
6 apps: a skydiving app (way2fly), a fitness app (way2move), a finance app (way2save), a business tool (cortex.ai), a personal assistant (marco.ai), and the command center that builds everything else (build.ai).
How each domain works
Skydiving has progression rules and safety checklists. Fitness has training programs and exercise knowledge. Finance has budget categories and spending rules. Each domain has its own playbook.
His data
Jump logs, workout history, bank transactions, personal goals, and routines. This is Marco's data -- personal to him and stored separately from the rules and recipes.
Someone else using harness.os would write their own config. A hotel chain might configure: build harness (same development practices), operations harnesses (hospitality onboarding, compliance, scheduling), and domain harnesses (staff records, guest data). Different industry, same playbook, different game plan.
Products as Compositions
Every product in Marco's config is a specific composition of harness types. build.ai = Build + Product. cortex.ai = Operations + Domain. marco.ai = All four. New products don't require new platforms -- they require new compositions. And the factory pattern means any combination of the four types is a valid product.
Architecture deep dive: Product composition model
Product Abstraction Map
| Product | Harness Layers | Stack | Agent Types | Status |
|---|---|---|---|---|
| build.ai | Build Product | React 19 + Express + WS | Code, review, content, design, research | Built |
| cortex.ai | Operations Domain | Flutter (Dart) | Debrief, compliance, onboarding (per-tenant) | Planned |
| way2do.ai | Domain | Flutter + Web | Cross-app assistant (subscription-gated) | Planned |
| marco.ai | Build Product Operations Domain | Flutter + Web | Cross-domain assistant, full harness control | Planned |
build.ai = Build + Product harness abstraction. cortex.ai = Operations + Domain for B2B. way2do.ai = Domain layer as subscription hub. marco.ai = All four layers, full mesh access. Each product is a different slice of the same config, composed through the methodology's type system.
Data deep dive: Composition as data wiring
Data Ownership per Product
| Product | Harness Layers | Owns | Data Types |
|---|---|---|---|
| build.ai | Build + Product | Platform data | Requests, sessions, agents, experiments (37 tables in main branch) |
| cortex.ai | Operations + Domain | Tenant data | Per-tenant knowledge, workflows, learnings, domain data (branch-per-tenant) |
| way2do.ai | Domain | Subscription state | User subscriptions, cross-app access grants (reads way2* domains) |
| marco.ai | All four | Full mesh | Authenticated read/write to ALL harnesses |
Data ALWAYS lives with its home app. The mesh READS across apps. WRITES go through a pending_changes approval queue. way2fly owns jump data. way2save owns transactions. build.ai orchestrates everything but never bypasses ownership. This is a config policy decision enforced at the mesh level.
Strategic deep dive: Product factory economics
Each product in Marco's config serves a different market through a different composition:
build.ai Platform
Build + Product. The developer hub and SaaS factory. Manages all engineering work.
cortex.ai B2B SaaS
Operations + Domain. Packages operational workflows for business customers. Each tenant gets their own harness composition. Full deep-dive →
way2do.ai Consumer Hub
Domain. Subscription hub bundling way2fly + way2move + way2save with cross-domain assistant. Full deep-dive →
marco.ai Everything
All four harness types. Mobile: domain + operations. Web: full platform control. The owner's view.
Personal apps (way2fly, way2move, way2save) prove the model per vertical. cortex.ai packages it for businesses. way2do.ai bundles it for consumers. build.ai is the factory that builds everything. Each product strengthens the whole -- new products are new compositions, not new rewrites.
Each app uses different combinations of the four knowledge types:
build.ai
Uses Recipes + Blueprints. The factory that builds all the other apps.
cortex.ai
Uses Playbooks + Records. Gives businesses their own AI assistant with operational rules and data.
way2do.ai
Uses Records. Your personal hub connecting data from all your apps behind one smart assistant.
marco.ai
Uses all four types. The master control room -- sees recipes, blueprints, playbooks, and records.
The magic is in the combinations. A restaurant needs recipes + records. A construction firm needs blueprints + playbooks. Each business uses a different combination. harness.os works the same way -- each app picks the knowledge types it needs, and the platform provides them.
What is a Mesh
A mesh is a running instance of connected harnesses following a config. Think of config as the blueprint, mesh as the building. Marco's personal mesh has 6 apps connected through 18 harness instances. A Lake Deck mesh (cortex.ai tenant) has hospitality processes connected through their harnesses. One config can spawn multiple meshes. Apps are on a mesh, not the mesh.
Technical deep dive: Mesh topology and connections
The mesh is the runtime. It is managed by harness-mesh.ts -- a multi-client MCP connection manager. Each harness instance is registered in harness_instances with its database_url, base_type, and tool_filter. When an agent queries a harness, the mesh manager spawns (or reuses) a Python MCP server process connected to the correct Neon branch.
Instance Registry
| Type | Instances | Neon Branch | Isolation |
|---|---|---|---|
| Build | build-harness, marco-builder | build-harness, marco-builder | Slug-filtered / dedicated |
| Product | w2f, w2m, w2s, cortex, marco-ai, build-ai | product-shared | Slug-filtered (6 on 1 branch) |
| Operations | skydive, fitness, finance, life-mgmt, marco-personal | 5 dedicated branches | Branch-isolated |
| Domain | w2f-domain, w2m-domain, w2s-domain, cortex per-tenant, marco-domain | Planned | Per-app branches (planned) |
Multiple Meshes from One Config
The same config can spawn multiple meshes. Marco's personal mesh is one instance. A cortex.ai Lake Deck tenant mesh is another -- it uses a subset of the config (operations + domain types) with Lake Deck-specific harness instances. The config defines the architecture; each mesh is a deployment of that architecture with its own data.
Config = class definition. Mesh = object instance. marco-config defines harness types, connection patterns, and tooling. marco-mesh is the running system with real data. lake-deck-mesh is another instance of a cortex-tenant-config subset. Each mesh has its own data, its own learnings, its own accumulated intelligence -- but they share the same structural patterns because they follow the same methodology.
Data deep dive: Live data topology and flow
A mesh is the live data topology -- the actual running system where data flows between harness instances.
Mesh-Level Observability
| Table | Purpose | Key Columns |
|---|---|---|
mesh_events | Event stream | event_type, harness_id, payload JSONB, timestamp |
mesh_transactions | Cross-harness operations | steps JSONB, total_duration_ms, harness_ids[] |
Multiple Meshes, Same Schema
Because the methodology defines the schema, every mesh -- regardless of which config spawned it -- uses the same tables. This means monitoring tools, query patterns, and analytics pipelines work across all meshes. A dashboard built for Marco's mesh works for Lake Deck's mesh. This is the data engineering payoff of the three-layer model.
Cross-harness operations are logged in mesh_transactions with a steps JSONB array tracking each harness queried, response time, and data returned. This enables cost attribution per query fan-out and identification of slow harness instances. The pattern is the same across all meshes -- methodology-level observability.
Strategic deep dive: Mesh as running business
The mesh is the running business. Each mesh is independent but can share learnings with other meshes through the methodology's knowledge flow patterns.
Marco's personal mesh is one business. Each cortex.ai tenant mesh is another business running on the same config subset. One config can power many meshes -- each with its own data, its own learnings, and its own revenue. This is the scaling model: configs spawn meshes, meshes generate value.
Think of the mesh as a nervous system. Your brain (the config) defines how your nervous system works. The mesh is the actual nervous system -- real nerves carrying real signals between real organs. When you ask "Can I afford skydive camp?", the mesh carries the question to three different apps (finance, skydiving, fitness), gets answers from each, and brings them back to you as one unified response.
Different people can have different nervous systems. Marco's mesh connects skydiving, fitness, and finance apps. A hotel's mesh would connect staff training, guest management, and compliance apps. Different organs, same nervous system design -- all following the harness.os methodology.
Cross-Domain Flows
The mesh makes cross-app reasoning possible. When you ask a question that spans fitness and finance, the mesh knows which harness instances to consult, routes the queries, and combines the answers. No single app can do this alone -- it's the mesh that makes the whole greater than the sum of its parts.
finance-harness (budget check via spine_rules), skydive-harness (camp schedule + prereqs from cortex_chunks), and fitness-harness (readiness score from fed_activities). Results merge in the agent's context window. Total mesh latency: ~200ms fan-out + per-branch query time.
finance-harness returns budget row from fed_transactions. skydive-harness returns camp dates + progression rules from cortex_chunks + spine_rules. fitness-harness returns readiness metric from fed_activities. All tracked in mesh_transactions.steps[] with per-branch latency.
data
update
fed_activities in its domain harness. A spine_rules trigger in the fitness-harness fires on "mobility_session_complete", pushing compensation metrics to the mesh. The skydive-harness spine_workflows("skill-prerequisites") evaluates the new data and updates the skill tree node status. way2fly's UI polls or receives a WebSocket push of the updated prerequisite state.
fed_activities INSERT in fitness-harness branch. Read path: skydive-harness queries fed_activities via cross-harness MCP call for compensation metrics, evaluates against spine_rules where triggers[] @> '{skill-prereq-check}', and writes updated skill status to learnings table. Logged in mesh_transactions as a 2-step cross-harness operation.
From Products to Partnerships
The three layers create three levels of composability. The methodology lets anyone build products. The config's products are themselves platforms (cortex.ai lets companies compose, way2do.ai lets consumers compose). Each platform product spawns its own meshes. Methodology creates products. Products create partnerships. Partnerships create ecosystems -- each a running mesh.
provision
created
neon_create_branch(parent="main", name="lake-deck") + register in harness_instances. Phase 2: MCP bulk_insert seeds cortex_chunks with hospitality domain knowledge. Phase 3: agent_process_assignments configured with tenant-scoped agents. The result: a new mesh instance spawned from the cortex.ai config subset.
harness_instances row inserted with base_type='operations'. Knowledge seeded: 12 spine_rules (hospitality), 3 spine_workflows. Cross-tenant learnings with transferability_score > 0.7 pre-loaded. A new mesh (data topology) is now live.
Technical deep dive: Partnership architecture
Mesh Separation (Future Architecture)
As a partnership mesh grows, it can graduate to a fully independent mesh:
Mesh-to-mesh communication (planned)
// Today: single Neon project, partition-scoped meshes
harness-os-mcp -> single Neon project -> branch-per-harness
// Future: federated meshes
lake-deck-mesh -> own Neon project -> own branches -> own agents
<-> MCP bridge (cross-mesh queries via authenticated endpoints)
marco-mesh -> own Neon project -> retains full visibility
// The MCP protocol already supports this: each mesh is a set of MCP servers.
// Federation = routing queries to the right mesh's MCP endpoint.
Strategic deep dive: Partnership and revenue model
Methodology = teachable/licensable. Config products = platform products that each spawn meshes. Meshes = running businesses. cortex.ai turns the methodology into a SaaS product where each tenant is a new mesh. way2do.ai turns it into a consumer product where each subscriber connects domain meshes. The methodology creates products, products create partnerships, partnerships create meshes.
The franchise model: The methodology is the franchise playbook. Marco's config is the first franchise location. Each cortex.ai tenant (Lake Deck, Aluminex) is a new franchise location with its own running system (mesh). They share the playbook, benefit from each other's learnings, but operate independently. If a location grows big enough, it can become its own franchise chain -- its own complete system that still trades knowledge with the original.
Agent Architecture
Every agent has two halves: the outer harness (knowledge AND process — what it knows AND how work should be done) and the inner harness (the thin runtime connector — just enough to read the outer harness and execute). Most of "how it runs" lives in the outer harness as rules, workflows, and process definitions. The inner harness is deliberately minimal — accept context, call model, route tools — so it's trivially swappable. An agent's full intelligence survives even if you replace the AI model powering it.
Architecture deep dive: Agent execution model
Outer Harness (Knowledge from the Mesh)
From Build Harness
Dev patterns, coding standards, CI/CD workflows. Sourced from cortex_chunks + spine_rules in build branches of the mesh.
From Product Harness
Architecture decisions, feature specs, roadmaps. Sourced from product-shared branch with slug filtering on the mesh.
From Operations Harness
Operational workflows, domain rules, triggers. Sourced from dedicated operations branches on the mesh.
From Domain Harness
User data context -- recent jumps, current training plan, account balance. Makes agent responses personal.
Inner Harness (Execution -- Config Choice)
| Type | Implementation | Best For | Status |
|---|---|---|---|
| cli-spawned | claude --print --stream-json -p "<prompt>" | Code tasks, file creation, review | Built |
| api-built | @anthropic-ai/sdk direct API calls | Assistant chat, data queries | Built |
| third-party | GitHub Copilot, OpenAI Codex, webhooks | Specialized external tools | Planned |
The inner harness type is a config choice. A different config might use only API-built agents, or only third-party tools. The outer harness (knowledge from the mesh) stays the same regardless of execution method.
Data deep dive: Agent data model
Agent Data Model
Schema
agents (id, name, type, capabilities TEXT[], status, system_prompt, model_preference)
agent_implementations (id, agent_id, impl_type, model, active, stats JSONB)
agent_knowledge (id, agent_id, knowledge_type, content, domain)
agent_process_assignments (agent_id, phase_template, role)
Agent knowledge compounds in the mesh over time. After 100 sessions, the learnings table in each harness contains patterns that improve future prompt construction. Build harness learnings improve all code agents on the mesh. Operations harness learnings improve all operational agents. Compound effects are per-harness-type, not per-agent -- a new agent on the mesh inherits accumulated domain wisdom immediately.
Strategic deep dive: Agent capabilities
Agents come in three roles, each drawing knowledge from different harness types on the mesh:
Orchestrator
Decomposes complex requests into phases. Routes to specialist agents. One per pipeline.
Lead
Owns a pipeline phase end-to-end. Can delegate sub-tasks to workers.
Worker
Executes specific, scoped tasks. Reports up to the lead.
Agents are developed once in build.ai, then deployed across meshes. A "debrief coach" agent developed for way2fly can be activated in a cortex.ai tenant mesh for hospitality debriefs -- same behavior, different operations harness knowledge from a different mesh. Build once, deploy across meshes.
Think of agents as smart workers. Each agent gets two things from the harness: the knowledge it needs (recipes, playbooks, records) AND the instructions for how work should be done (step-by-step procedures, quality rules, handoff checklists). All of that lives in the mesh. The worker itself — the actual person or tool doing the job — just needs to be capable enough to follow instructions and use knowledge. You can swap out the worker without losing any intelligence, because the intelligence is in the harness, not in the worker.
Agents come in three levels: managers (break big tasks into smaller ones), leads (own one part of the work), and workers (do specific tasks). Every time an agent completes a task, the mesh learns from the experience.
The Inner Harness is Solved
The key realization
The inner harness — the execution engine, the agent that runs tasks — is a solved problem. Claude Code exists. Copilot exists. Custom API agents are straightforward to build. build.ai creates inner harnesses as part of its pipeline. These tools keep getting better every month as new models and tools ship.
Inner Harness (Thin Connector)
The minimal runtime. Deliberately thin so it's trivially swappable.
- Claude Code (CLI-spawned agents)
- Copilot, Cursor, Windsurf
- Custom API-built agents (Anthropic, OpenAI)
- Third-party tools (Codex, Devin, etc.)
- Future: whatever ships next quarter
The inner harness only needs to do three things: read context from the outer harness, call a model, and route tool calls back. That's a standard MCP interface — any tool that speaks MCP can be the inner harness.
Outer Harness (The Full Intelligence)
Knowledge AND process. What it knows AND how work should be done.
- Structured knowledge (four types, CNS schema)
- Process definitions — rules, workflows, phase templates
- Accumulated learnings (compound over time)
- Session lifecycle (start → context → work → learn → handoff)
- Cross-domain reasoning across the mesh
Most of "how it runs" lives here — as rules and workflows, not as code. This is what makes the inner harness swappable: the intelligence is in the harness, not in the tool.
What this means for the methodology
Software exists to improve processes. AI is a new element participating in that improvement. The outer harness is data organization for this new element — structuring knowledge so AI can read it, use it, learn from it, and improve the processes it participates in.
The inner harness only needs the minimum interface to connect: read knowledge, receive rules and workflows, write learnings back. That's it — a standard MCP connection. By defining all process logic in the outer harness, the inner harness stays thin enough to be swapped without losing anything.
This is why creating an "autoharness" (a new inner harness from scratch) is less valuable than creating a good outer harness that any inner harness can connect to. The energy should go into the knowledge and process layer, not the execution connector.
Technical deep dive: Why the outer harness is the moat
The harness.os-mcp server exposes 27+ tools per harness instance. An inner harness (any MCP-compatible client) connects and gets:
| MCP Tool Category | What It Provides | Why It Matters |
|---|---|---|
start_session | Last handoff, current state, project context | Agent starts where the last one left off — not blank |
get_rules | Applicable rules for current activity | Agent follows established patterns without being told |
search_knowledge | Relevant knowledge chunks | Agent has domain expertise it was never trained on |
get_workflow | Step-by-step procedures | Agent follows consistent processes |
log_learning | Persist insights for future sessions | Every session makes the next one better |
end_session | Handoff summary for next agent/session | Continuity across agents and time |
The inner harness doesn't need to know about Neon branches, PostgreSQL schemas, or the four-type system. It just calls MCP tools. The outer harness handles all the knowledge architecture — including the process definitions that tell the agent HOW to work. The inner harness is deliberately thin: it's a generic connector, not a brain. The brain is the outer harness.
If Claude Code disappears tomorrow, the outer harness — every knowledge chunk, every rule, every learning, every workflow — survives intact. Connect a different inner harness (Copilot, a custom agent, a future tool that doesn't exist yet) and it picks up where Claude left off. That's the proof that the value is in the outer harness.
Think of it like a franchise manual and a new hire. The outer harness is the complete franchise system — the recipes, the quality standards, the step-by-step procedures, the lessons from every previous shift. The inner harness is the new employee — someone who comes in, follows the manual, and adds their own notes about what they learned. You can replace the employee (different AI tool, different model) and the franchise system stays the same. The intelligence is in the system, not in the individual worker. The worker just needs to be capable enough to follow instructions.
Request Pipeline
When someone asks the platform to do something, the request gets broken into phases, each assigned to the right agent, executed in sequence with real-time streaming output. Each phase queries the relevant harness types on the mesh for context. The pipeline is the execution layer that connects requests to the mesh.
Technical deep dive: Request lifecycle
Requests enter from multiple sources and flow through a template-driven pipeline. The pipeline's mesh scope depends on the mode: build mode activates build + product harnesses; operations mode activates operations + domain harnesses.
Phase State Machine
pending → active → ┌ completed
├ skipped
└ failed → retry → pending
Participation Modes
| Mode | Behavior |
|---|---|
| human-in-loop | Pause after each phase. User reviews, approves next. |
| ai-in-loop | Auto-start next phase on complete. Full pipeline runs unattended. |
| prototype | Simulated. No real agent calls. Tests pipeline design. |
Data deep dive: Pipeline data flow
Pipeline Data Tables
| Table | Role | Key Fields |
|---|---|---|
requests | Work items | title, priority, status, business_model, project_id |
request_phases | Pipeline phases | request_id, template_phase, status, agent_id, session_id |
sessions | Execution runs | phase_id, input_tokens, output_tokens, cost, duration, output_lines JSONB |
process_templates | Pipeline definitions | business_model, request_type, phases JSONB |
Session output stored as JSONB arrays of {type, content, timestamp} objects. Enables post-hoc analysis, cost attribution per phase, and learning extraction. Learnings are written back to the originating harness type on the mesh.
Strategic deep dive: Workflow as product feature
How It Works
- Request arrives -- from any source (web UI, mobile, CLI, API)
- Template selected -- based on business model + request type, selects relevant harness types on the mesh
- Phases execute -- each phase gets an assigned agent that queries the mesh for context
- Artifacts generated -- each phase produces deliverables
- Review and done -- user reviews output, request marked complete
Human-in-the-loop: Review after each phase (critical work). AI-in-the-loop: Phases auto-advance (routine tasks). Prototype: Simulates without running agents (testing new templates).
Think of it as an assembly line for work. When you ask the system to do something, it breaks the work into steps. Each step has a specialist worker (agent) assigned to it. The worker consults the right knowledge types on the mesh to do its job well.
You can choose how much control you want: review each step (the system pauses and asks you to approve), or let it run (the system completes all steps automatically).
Scale Tiers
The methodology stays the same at every scale. What changes is the implementation — how knowledge is stored, how harnesses communicate, and how the mesh is managed. Files work for one person. Databases work for a team. Federated APIs work for an enterprise. The four types (build, product, operations, domain), the internal structure (knowledge, rules, workflows, learnings), and the session lifecycle are identical at every tier.
Tier 1 — Files & Conventions (Solo / Local)
When: 1 person, 1-3 projects, starting out or experimenting.
Storage: Markdown files in the repo. CLAUDE.md at the root, .claude/rules/ for domain rules, docs/ for decisions and specs. Knowledge lives in files, organized by convention.
Harness types in practice:
- Build:
CLAUDE.md+.claude/rules/— coding standards, workflow, architecture - Product:
docs/ARCHITECTURE.md,docs/phases/— specs, roadmap, decisions - Operations:
docs/domain/— domain knowledge, process descriptions - Domain: App databases (Firebase, Postgres) — runtime user data, not in files
Communication: File imports (@.claude/rules/testing.md), copy shared rules between repos manually.
Session lifecycle: Implicit. The agent reads files on startup, you update files manually after decisions. No formal start/end session.
Mesh: No mesh. Each project is isolated. Cross-project knowledge is manual copy-paste.
Move to Tier 2 when: You have 3+ projects and find yourself copying rules between repos, or you need knowledge to compound across sessions (learnings from project A should help project B automatically).
Tier 2 — Database & MCP (Solo / Power User)
When: 1 person, 3-10 projects, knowledge needs to compound and flow between contexts.
Storage: PostgreSQL with the CNS schema. Each harness instance gets its own database or branch (Neon copy-on-write branches are ideal). Knowledge is structured, queryable, and embeddable (VECTOR(1536) for semantic search).
Harness types in practice:
- Build:
cortex_chunks(coding standards, patterns),spine_rules(triggers for workflow enforcement),spine_workflows(8-phase dev process) - Product:
cortex_chunks(architecture, specs), scoped byproject_slugper product - Operations: Dedicated branches per domain (skydive-harness, fitness-harness) with domain-specific rules and workflows
- Domain:
fed_*tables or per-app databases (Firebase Firestore, etc.)
Communication: MCP (Model Context Protocol). One Python server per harness instance, spawned on demand. Mesh manager handles lazy connect, idle eviction, stale retry.
Session lifecycle: Explicit. start_session() loads handoff + rules. end_session() persists decisions and learnings. Knowledge compounds across sessions.
Mesh: Local mesh. Harness instances connected via MCP on the same machine. Cross-harness queries logged in mesh_transactions.
This is Marco's current tier. 6 apps, 18 harness instances, 10 Neon branches, local MCP mesh.
Move to Tier 3 when: A second person needs access to the mesh, or you're deploying harness-backed agents in production (real users, not just dev-time).
Tier 3 — Remote MCP & Multi-Tenant (Team / Production)
When: 2-20 people, multiple teams, production agents serving real users.
Storage: Hosted PostgreSQL (Neon, Supabase, RDS) with per-tenant branch isolation. Connection pooling (PgBouncer). Automated backups and point-in-time recovery.
Harness types in practice: Same schema as Tier 2, but with access control. Each team member has scoped access. Build harness is shared (company-wide standards). Product harnesses are team-scoped. Operations harnesses are department-scoped. Domain harnesses are user-scoped with row-level security.
Communication: Remote MCP servers via Streamable HTTP transport. API gateway for auth and rate limiting. Service discovery so agents can find harness instances.
Session lifecycle: Same protocol, but with auth context. Sessions carry user identity. Learnings are attributed. Conflict resolution for concurrent sessions on the same harness.
Mesh: Remote mesh. Harness instances are services, not local processes. Multiple agents can query the same mesh simultaneously. This is what cortex.ai tenants use.
New requirements:
- Authentication — who is querying which harness?
- Authorization — role-based access to harness instances
- Observability — dashboards for mesh health, query latency, learning accumulation
- Versioning — harness schema migrations across tenants
Move to Tier 4 when: Multiple departments need independent meshes that share learnings, or you're federating across organizations (partner meshes).
Tier 4 — Federated Mesh (Enterprise / Multi-Org)
When: 20+ people, multiple departments or organizations, cross-mesh learning is a competitive advantage.
Storage: Federated database clusters. Each department or partner gets their own database cluster. Schema is enforced by the methodology but storage is independent. Data sovereignty respected — no cross-org data copying without explicit consent.
Harness types in practice: Same four types, but meshes are independently operated. A company mesh has build + product + operations + domain. A partner mesh has operations + domain. Shared learnings flow through a federation protocol, not direct database access.
Communication: Federated APIs + event bus. Cross-mesh learning sync via pub/sub (learnings with high transferability_score are published to a shared topic). Each mesh subscribes to relevant topics. MCP is still used within a mesh; APIs are used between meshes.
Session lifecycle: Same protocol. Sessions are mesh-local. Cross-mesh interactions are async (learning sync, not real-time queries).
Mesh: Federated mesh. Multiple independent meshes that share learnings through a controlled protocol. Each mesh is autonomous. The federation layer adds cross-mesh intelligence without coupling.
New requirements:
- SSO + RBAC — enterprise identity, department-level permissions
- Audit logging — who accessed what, when, for compliance
- Federation protocol — how learnings flow between meshes (consent, filtering, attribution)
- Schema governance — methodology evolution across independent meshes
- Multi-region — data residency requirements per mesh
Status: Designed but not built. This is the target architecture for cortex.ai at scale — each tenant is a Tier 3 mesh, federation between tenants is Tier 4.
What Never Changes Across Tiers
Four Types
Build, Product, Operations, Domain. At every scale. A file-based build harness and a database-backed build harness serve the same purpose — they store creation knowledge.
Internal Structure
Knowledge, rules, workflows, learnings. At Tier 1 it's markdown sections. At Tier 2+ it's database tables. Same structure, different storage.
Session Lifecycle
Start (load context) → work → end (persist learnings). At Tier 1 it's reading files. At Tier 2+ it's start_session() / end_session(). Same pattern.
Tier mapping: what tools at each scale
| Component | Tier 1 (Files) | Tier 2 (DB+MCP) | Tier 3 (Team) | Tier 4 (Enterprise) |
|---|---|---|---|---|
| Knowledge store | Markdown files | cortex_chunks + pgvector | Hosted Postgres + pgvector | Federated Postgres clusters |
| Rules engine | .claude/rules/ files | spine_rules table | spine_rules + triggers | spine_rules + policy engine |
| Workflows | docs/ markdown | spine_workflows JSONB | spine_workflows + scheduler | spine_workflows + orchestrator |
| Learnings | Manual notes | learnings table | learnings + scoring | learnings + cross-mesh sync |
| Communication | File imports | Local MCP (stdio) | Remote MCP (HTTP) | MCP + federated APIs |
| Auth | None (local) | None (single user) | JWT + RBAC | SSO + RBAC + audit |
| Isolation | Repos/folders | DB branches + slugs | Tenant branches | Separate clusters |
| Agents connect via | Reading files | MCP tools | Remote MCP tools | API + MCP |
Real-World Adoption Path
Where I am right now — honestly
I'm at Tier 2 personally (database + MCP, 6 apps, 18 harness instances). I'm beginning to apply this at my company (legal tech — wills, trusts, POA automation) through a series of workshops. The goal: move the dev team from Tier 1 to Tier 2, then expand to company-wide Tier 3. Here's the actual plan and current progress.
Current Position on the Scale
| Context | Current Tier | What Exists | What's Next |
|---|---|---|---|
| Marco's personal projects | Tier 2 | 6 apps, 18 harness instances, 10 Neon branches, local MCP mesh, build.ai orchestrating agents | Compound learning metrics, cross-mesh query patterns |
| cortex.ai tenants | Tier 2→3 | 2 tenants (Lake Deck hospitality, Aluminex manufacturing), per-tenant isolation | Remote MCP, more tenants, cross-tenant learning |
| Company (legal tech) | Tier 1 | Developers using AI tools with basic prompts. No structured outer harness. No shared knowledge. | Workshop series → team Build harness → company-wide mesh |
The Workshop Sequence — How You Actually Adopt This
This is the sequence being used at the legal tech company. It's designed to be repeatable for any team.
Workshop 1 — Inner Harness Basics COMPLETED
Teach the team what the inner harness IS. How AI tools work as thin runtime connectors — accept context, call model, route tools. Demo Claude Code, Copilot, and custom agents. Key takeaway: the inner harness is a solved problem. These tools exist, they're getting better every month, and the team should use them.
Outcome: Team understands that using AI tools effectively isn't about which tool — it's about what you give the tool to work with.
Workshop 2 — Outer Harness Concepts IN PROGRESS
Teach structured knowledge AND process definitions that make agents effective. Different kinds of outer harness content: knowledge, rules, workflows, learnings, process definitions. Key takeaway: the differentiation is in the outer harness — the full intelligence, not the thin connector. A well-organized outer harness makes ANY inner harness dramatically more effective.
Outcome: Team sees the gap between "using Claude with no context" and "using Claude with structured knowledge." The difference is visible and compelling.
Workshop 3 — My Outer Harness as Demo PLANNED
Show my actual personal dev workflow harness to the team. Live demo: session lifecycle, knowledge persistence, how agents improve over time. Then two critical exercises:
- Identify what applies to the team — Which parts of my personal outer harness would make the team more effective? Coding standards, architecture patterns, testing rules, CI/CD workflows.
- Identify what needs to change — What's personal vs team? What needs multi-user support? What's missing for the legal domain?
Outcome: Team sees a working Tier 2 system and identifies what they want for themselves.
Workshop 4+ — Team Development Harness PLANNED
Build the team's outer harness together. Start with a Build harness (dev workflow, coding standards, architecture rules). Then Product harnesses per project. The team co-creates this — it's not imposed from above, it's built from what they agree makes them more effective.
Outcome: Team has their own Tier 1 outer harness (files/conventions). Foundation for moving to Tier 2.
Expansion — Company-Wide harness.os FUTURE
Scale from dev team harness to department harnesses to company mesh. The legal domain is a strong fit for Operations harnesses: structured processes, compliance requirements, document workflows, approval chains. Eventually: the company's legal processes become Operations harnesses that any inner harness can plug into.
End state: Dev team at Tier 2 (DB + MCP), company at Tier 3 (remote MCP, multi-team). harness.os methodology validated at real company scale.
Why This Sequence Works
Show, don't pitch
Workshop 3 shows a working system, not slides. People believe what they see.
Co-create, don't impose
The team builds their own harness. Adoption happens because they chose it, not because it was mandated.
Start at Tier 1
Files first. No infrastructure needed. The team can start tomorrow with CLAUDE.md files. Move to database when the need is obvious.
Dev team first
Prove it where you have control before expanding to departments where you need buy-in.
What I'm NOT claiming
I'm not claiming this is finished, or that it works for everyone, or that the company needs to adopt the entire methodology. I'm saying: here's what I built for myself (Tier 2), here's what I learned, and here's what I think could help us as a team. The methodology scales down to "just use CLAUDE.md files effectively" and scales up from there. You take what's useful.
The Journey
Why this section exists
harness.os was not designed top-down. It was discovered bottom-up, through months of building real products with AI. Every concept emerged from a real problem. This is the trajectory — the sequence of realizations that turned scattered files into a methodology.
Phase 1 — Three Projects, Files Everywhere
Got an AI coding subscription. Started building three personal projects simultaneously — a skydiving app, a fitness app, and a finance app. Discovered that the AI tool needed structured context to be useful. Started organizing with files: CLAUDE.md at the root, rules folders, decision docs. Tier 1 of the methodology, discovered by necessity.
Phase 2 — The Artificial Anatomy
The files started to have structure: knowledge storage (cortex), structural rules (spine), workflows (nervous system), learnings (memory). Named it the "Artificial Anatomy of AI" — a CNS (Central Nervous System) metaphor. The internal harness structure was born.
Phase 3 — Dashboards & Personal Assistant
Built dashboards (build.ai) to control all projects from one place. Created a personal assistant (marco.ai) to read and edit those files. The files were now connected — decisions in one project could reference learnings from another. The mesh concept started forming, even without the name.
Phase 4 — Cross-App Data & Multi-Tenant Vision
Realized the data across all three apps should be structured so an assistant could read and write across all of them. Also realized that apps need more dynamic, feedback-driven development — metrics, usage data, and user requests through assistants should drive improvement. Conceived cortex.ai: the same brain, packaged for any company. The config/mesh distinction emerged — one methodology, many deployments.
Phase 5 — The Outer Harness Wins
Started building custom agents. Tested them against existing agents (Claude Code, Copilot). Found that the custom agent kept getting smaller — the runtime connector (inner harness) mattered less and less. Most of "how it runs" belonged in the outer harness as rules and workflows. What mattered was the full intelligence (outer harness) — knowledge AND process definitions — that any thin connector could plug into. Key insight: the real value is persistent, structured intelligence that outlives any specific AI model.
Phase 6 — Process Improvement Is the Real Game
Software exists to improve processes. AI is a new element that helps with what we've always done — but now we need to structure software better for this new element to participate. It needs to store things, retrieve things, learn, and improve processes from the inside. Continuous process improvement is now real, and AI accelerates it. harness.os is not a product — it's a methodology for organizing AI knowledge so processes continuously improve.
Phase 7 — The Four Types Crystallize
All knowledge falls into four categories of process: creation (build), discovery and specification (product — WHY + WHAT), domain operations (operations), and user data (domain — WHO). Product includes the full lifecycle: discovery → building → maintaining → archived, with continuous validation throughout. These four cover most of what people and companies need. They're customizable at each level, plug-and-play. New products are just new compositions of these types. The type system was the last piece — making harness.os a complete, composable methodology.
Now — Validation at Scale
The methodology works at Tier 2 (database + MCP, single user, 6 apps). Next: apply it at a real company (legal tech, 20+ people), validate Tier 3, and prove that the four types hold across industries. The hypothesis: the methodology is universal. The implementation scales. The types are complete. This is being tested, not assumed.
The Compound Effect
The three-layer model creates a compounding flywheel. The methodology enables configs. Configs spawn meshes. Meshes generate learnings. Learnings with high transferability flow back through the methodology's knowledge flow patterns, making every other mesh smarter. More meshes = more data = more intelligence = more meshes.
learnings table accumulates session outcomes correlated with pre-session activities via mesh_transactions. When the pattern reaches statistical significance, it's written with transferability_score=0.92. cortex.ai tenant meshes with process_type='training' inherit the generalized form. marco.ai's assistant queries learnings WHERE transferability_score > 0.7 for proactive suggestions.
{category: 'cross-domain', insight: '...', transferability_score: 0.92}. Cross-mesh query: cortex.ai operations harness reads learnings WHERE transferability_score > 0.7. marco-personal-harness reads same learnings for proactive suggestions. No data copying -- federated reads across mesh branches.
Technical deep dive: Compound learning mechanics
Today
- 4 harness types, 18 instances across 10 Neon branches
- build.ai web UI -- Build + Product mesh scope
- Federated domain data in main DB
- cli-spawned + api-built agents
- Single-user (Marco) mesh
Tomorrow
- + cortex.ai: Operations + Domain as tenant meshes
- + way2do.ai: Domain as consumer mesh subscriptions
- + marco.ai: All 4 types (mobile + web meshes)
- + Per-app domain harnesses (Neon Pro)
- + Cross-mesh learning federation
- + Anyone can define their own config and spawn a mesh
Today: one config, one primary mesh. Tomorrow: the config itself becomes configurable. cortex.ai already demonstrates this -- each tenant runs their own mesh. The end state: the methodology is the specification, configs are published and forkable, and meshes are provisioned on demand. Anyone can define which harness types they need, and the platform provisions a mesh automatically.
Strategic deep dive: The compound moat
The three-layer model is the moat. The methodology is the IP. The config creates the product portfolio. The meshes generate compound value. Personal app meshes prove each vertical. cortex.ai meshes package it for B2B. way2do.ai meshes bundle it for B2C. build.ai is the factory that creates configs and provisions meshes. New products are new compositions in the config. New customers are new meshes. The factory feeds itself.
Today, configs are predefined. Tomorrow, the config becomes a product feature. Imagine: a cortex.ai customer choosing which harness types to activate for their mesh. Or a way2do.ai user selecting which domain harnesses to subscribe to. The three-layer model makes this inevitable -- methodology provides the rules, config provides the options, mesh provides the running system.
Where this is headed: Today, one person runs one system with 6 connected apps. Tomorrow, anyone can create their own version. Businesses can pick which operational knowledge they need and get a running system (mesh) in days. Consumers can pick which apps to connect and get a personal assistant that understands their whole life. The playbook (methodology) stays the same. The game plans (configs) multiply. The running systems (meshes) compound in value.
Why This Works
Every claim in this documentation has been tested against the hardest question: 'Is this actually different, or just complicated?' The three-layer model (methodology, config, mesh) is the answer -- it separates the universal from the specific from the running, making the system both principled and practical.
How harness.os compares to existing AI orchestration platforms -- and why the three-layer separation matters.
Competitive Landscape
| Platform | What It Does | What It Lacks |
|---|---|---|
| CrewAI | Multi-agent orchestration with role-based agents and task delegation | No persistent knowledge layer. No methodology/config/mesh separation. Agents start blank every run. |
| LangChain / LangGraph | LLM application framework with chains, agents, memory, and graph-based workflows | Toolkit, not a methodology. Knowledge embedded in code, not composable. Memory is per-conversation, not per-domain. |
| AutoGen (Microsoft) | Multi-agent conversation framework for collaborative task solving | Research-oriented. No persistent learning. No config/mesh distinction. No multi-product composition. |
| Dify / FlowiseAI | Visual AI workflow builders with drag-and-drop pipeline design | Single-tenant, single-app. No knowledge mesh. No cross-domain reasoning. No compound learning. |
| harness.os | Three-layer AI knowledge platform: universal methodology, portable configs, running mesh instances with compound learning | Early stage. Single-user origin. Economics thesis unproven at scale. |
Technical deep dive: What's architecturally original
What's Architecturally Original
No other platform separates methodology from config from runtime mesh. This separation means the universal principles (CNS schema, type system, session lifecycle) are independent of any specific tech stack, domain, or deployment. Configs are portable. Meshes are independent. This is the architectural innovation that makes everything else possible.
Every competitor treats agents as stateless executors. harness.os inverts this -- each harness instance on the mesh is a persistent knowledge store. Agents don't start blank. They start with everything the mesh knows. A manufacturing safety review agent in a cortex.ai mesh has access to every safety learning from every previous review -- automatically.
Multi-tenancy in most AI platforms means filtering by tenant ID. harness.os uses Neon branch-level isolation -- each harness instance can be a physically separate database branch. This is PostgreSQL-native isolation, not application-layer filtering. Each tenant mesh is genuinely independent.
Technical Risks -- Acknowledged
Risk: "One person built this"
The three-layer model is the answer. The methodology reduces complexity because products are compositions of the same config, and meshes are instances of the same patterns. A new SWE works on one MCP server that powers everything, not 6 separate apps.
Risk: "Neon free tier limits (10 branches)"
Config constraint, not methodology constraint. Neon Pro ($19/mo) unlocks unlimited branches. The mesh connection management was designed for hundreds of branches from day one.
Risk: "MCP is young"
The harness-os-mcp server is a thin Python layer over standard PostgreSQL. If MCP evolves, the mesh adapter changes -- the methodology's schema, data, and knowledge don't. The 37 database tables are the real asset.
Data deep dive: What's original about the data layer
What's Original About the Data Architecture
In every other AI platform, knowledge is embedded in prompts or stored as opaque embeddings. harness.os treats knowledge as first-class relational data in normalized PostgreSQL tables. You can run SQL analytics on the knowledge itself: "Which harnesses on this mesh have the most learnings? Which rules are referenced most?" This works across all meshes because the methodology defines the schema.
When a way2fly mesh produces a learning, and that learning generalizes (high transferability_score), it becomes available to other meshes following the same methodology. This isn't RAG -- it's structured relational data across isolated branches with explicit cross-mesh query patterns. The data compounds because it's structured, not because it's embedded.
Data Risks -- Acknowledged
Risk: "Branch-per-harness doesn't scale"
Neon branches are copy-on-write with shared storage. 100 cortex.ai tenant meshes share most storage cost. Marginal cost per mesh drops, not rises.
Risk: "No proven compound learning metrics yet"
True. The mesh_events and harness_budgets tables are in the schema but the metrics pipeline isn't built yet. This is the next major data engineering initiative -- greenfield work on a novel three-layer architecture.
Strategic deep dive: Competitive differentiation
What's Genuinely Differentiated
The methodology is IP (teachable, licensable). Configs are strategies (forkable, portable). Meshes are running businesses (scalable, independent). No other AI platform has this separation. CrewAI, LangChain, and AutoGen are toolkits -- they don't separate universal principles from specific implementations from running instances.
cortex.ai demonstrates this: one config (operations + domain for B2B), multiple meshes (Lake Deck = hospitality, Aluminex = manufacturing). Each mesh is independent but shares learnings through the methodology. More meshes = more compound intelligence = more value per mesh. The three-layer model makes this scaling pattern natural.
Strategic Risks -- Acknowledged
Risk: "The economics thesis is unproven"
The first 2-3 cortex.ai tenant meshes will prove or break it. If the second mesh takes less than 50% of the first mesh's setup effort, the thesis holds.
Risk: "Too complex to pitch"
The 60-second version: "AI agents that learn from every task and share that knowledge across all our products. A hotel's AI learned something about onboarding? Now the factory's AI is better at onboarding too." Lead with the outcome, not the three layers.
The Recruiting Pitch
For a Data Engineer
"I've built a three-layer AI knowledge platform. You'd own the mesh data architecture: branch strategy, schema evolution, cross-mesh queries, learning accumulation metrics. Greenfield data engineering on a novel architecture."
For a Product Director
"I've built a methodology that ships a new AI product config in days. I need someone to turn configs into businesses -- prioritize which meshes to launch, define go-to-market, run customer discovery."
For a Software Engineer
"I've built an AI orchestration mesh that's architecturally different from anything in the market. Three-layer separation, persistent knowledge, branch-level isolation. Real systems engineering, not prompt wrangling."
For a Partner
"I've built the methodology and the factory. You bring the industry. We create a config, spawn a mesh, and your customers get AI-powered operations. You own the customer relationship. Knowledge flows both ways."
What Makes This Different
Most AI tools are like hiring a consultant who forgets everything after each meeting. You explain your business, they help, they leave, and next time you start from scratch. harness.os is like building a team that remembers everything -- every insight, every decision, every lesson learned. And when you bring in a new team member (add a new app to the mesh), they start with everything the team already knows.
Real example: A hotel (Lake Deck) started using cortex.ai -- their own running system (mesh). The AI learned what makes good hospitality onboarding. Then a manufacturing company (Aluminex) got their own mesh. Their onboarding AI was already better because of what the hotel taught it. That's the three-layer compound effect: the playbook (methodology) enables multiple game plans (configs), each running a system (mesh) that makes every other system smarter.
Honest About What's Not Done Yet
It's early
The platform works and powers real products. But it's been built by one person. The architecture (three layers, four knowledge types) is solid. The team needs to grow.
The business model is being proven
Two real businesses are using cortex.ai -- each running their own mesh. The theory is that each new mesh makes the platform cheaper and smarter. The trend so far is positive.
Why honesty is a strength
This platform shows you everything -- the playbook, the game plan, the running system, what's built, what's not. That transparency means anyone who joins knows exactly what impact they'll have.
Honest Assessment
Why this section exists
Every framework sounds brilliant when it's only described by its creator. This section deliberately separates what's validated (proven, working, real), what's genuinely novel (no one else is doing this specific thing), and what's unproven (thesis only, needs more evidence). Read this before making any commitment.
What's Validated — It Works
The architecture runs real products
6 apps connected through the harness mesh. build.ai orchestrates agents. cortex.ai serves 2 real business tenants (Lake Deck, Aluminex). way2fly, way2move, way2save are functional consumer apps. This isn't a whiteboard idea — it's deployed code with real data flowing through it.
Evidence: Running Neon databases, 37-table schema, MCP server handling 27+ tools per harness, real sessions with real output.
MCP as the mesh protocol
MCP (Model Context Protocol) has become the industry standard for AI tool integration. 97M+ monthly SDK downloads. Adopted by OpenAI, Google, Microsoft, AWS, and every major AI lab. The bet on MCP as the mesh communication layer was correct — it's now the de facto protocol.
Evidence: MCP Streamable HTTP transport is production-ready. Major IDE integrations ship with MCP support. The ecosystem is growing faster than GraphQL did at the same stage.
Knowledge persistence across sessions
The CNS schema (knowledge tables, learning tables, rules, workflows) demonstrably makes agents better across sessions. Agents don't start blank — they start with accumulated knowledge. This is observable: session quality improves as harness knowledge grows.
Evidence: Every Claude Code session in this ecosystem starts with harness context. The difference between a first session and a 50th session on the same harness is dramatic.
The four harness types are natural
Build (creation), Product (management), Operations (domain ops), Domain (user data) — these four categories emerged organically from real development, not from theory. Every piece of knowledge encountered across 6 apps fits cleanly into one of these four types. No artificial forcing required.
Evidence: 18 harness instances across 10 Neon branches, all cleanly typed. No "miscellaneous" category needed.
Outer harness > inner harness
The insight that the full intelligence layer (outer harness — knowledge AND process) matters more than the thin connector (inner harness) has been validated repeatedly. Claude Code, Copilot, custom API agents — all perform dramatically better when connected to the same outer harness. The intelligence survives model changes, tool changes, even complete agent rewrites.
Evidence: Same harness.os-mcp server used by CLI-spawned, API-built, and third-party agents. Knowledge persists regardless of which model powers the agent.
Process improvement is the real game
Software exists to improve processes. AI is a new participant in that improvement. Structuring knowledge so AI can participate effectively in process improvement — this framing maps to decades of BPM (Business Process Management) literature. It's not a new idea, but applying it specifically to AI knowledge organization is timely and valid.
Evidence: BPM is a $16B+ market. Process mining (Celonis, etc.) proves companies pay for structured process improvement. AI participation is the obvious next step.
What's Genuinely Novel — No One Else Is Doing This
Three-layer separation (methodology / config / mesh)
No existing AI platform separates universal principles from specific implementations from running instances. CrewAI, LangChain, AutoGen, Dify — they're all single-layer: code is the config is the runtime. harness.os explicitly separates what's universal (the methodology), what's a specific strategy (the config), and what's running (the mesh). This is the core architectural innovation.
Closest parallel: Kubernetes separates specification (YAML) from runtime (cluster), but has no methodology layer. Docker Compose is similar. The three-layer idea extends this to AI knowledge.
Four-category type system for AI knowledge
The four-category decomposition (how to create, why + what to create, how to run domain operations, per-user data) emerged from practice building 7 products in 6 weeks, not from theory. It's the structure I use daily — not a claim that it's the only way to organize AI knowledge.
Risk: It may be too reductive. Some knowledge may not fit cleanly. But 18 instances across 6 apps haven't surfaced a fifth type yet.
Configs as portable AI strategies
The concept that your AI development workflow, architecture decisions, and domain knowledge can be packaged as a "config" — forkable, shareable, versionable, separate from both the principles that guide it and the runtime that executes it — doesn't exist in any current platform. cortex.ai's onboarding funnel is literally "create a new config, spawn a mesh."
Potential: If configs become a marketplace, this is the product-market fit moment.
Scale tiers with methodology invariance
The explicit design that the same methodology works at every scale — from local files ($0) to federated enterprise mesh ($2K+/mo) — with only the implementation changing, not the principles. Most frameworks either target small teams or enterprises. harness.os claims to be both, with clear tier transitions and triggers for when to move up.
What's Unproven — Thesis Only, Needs Evidence
Compound learning across meshes
The thesis that learnings from one mesh make other meshes smarter — and that this compounds over time — is the biggest unproven claim. The mesh_events and harness_budgets tables are in the schema, but the metrics pipeline doesn't exist yet. There's no data showing learning transfer rates, no measurement of compound effects.
What would validate it: Measure time-to-productive for the 3rd cortex.ai tenant vs the 1st. If the 3rd takes <50% of the 1st's setup time, the thesis holds.
The economics at scale
The claim that marginal cost per mesh drops while value per mesh grows — the compound economics thesis — has only 2 data points (Lake Deck + Aluminex). That's not enough to prove a trend. Infrastructure costs, support burden, and knowledge curation effort at 50+ tenants are completely unknown.
What would validate it: 5-10 cortex.ai tenants with tracked per-tenant cost/revenue. If the marginal cost curve bends down, it's real.
Methodology portability to other people
harness.os has been developed and used by one person. The methodology has never been applied by someone else independently. The biggest risk: what seems "natural" and "clean" to the creator may be opaque to others. The four types, the CNS schema, the session lifecycle — all of this needs to survive contact with a second user.
What would validate it: One external developer follows the methodology without hand-holding. If they produce a working config and mesh, the methodology is real. If they can't, it's just one person's system.
The "AI Knowledge Engineering" category claim
The hypothesis that harness.os represents a specialization of what is emerging as "AI Knowledge Engineering" — specifically that it contributes a process categorization framework to the practice — is early-stage. The practice is being named independently by different players: KPMG calls it "knowledge engineering", Anthropic calls it "context engineering", Martin Fowler calls it "harness engineering". The convergence validates the need; the vocabulary hasn't settled yet. This claim should be held lightly.
Honest take: It's more likely that harness.os contributes ideas to the emerging practice than that it defines the category. That's still valuable — but frame it modestly.
37-table schema appropriateness
A 37-table schema for a single-user platform with 2 tenants may be over-engineered. The schema was designed for a future that hasn't arrived. If growth doesn't materialize, this is technical debt, not foresight. An engineer reviewing this would ask: "Do you use all 37 tables actively, or are half of them aspirational?"
Counter-argument: The schema defines the methodology's data model. Removing tables would mean removing methodology concepts. But some concepts may not earn their keep.
The Honest Scorecard
| Claim | Status | Evidence |
|---|---|---|
| Persistent knowledge makes agents better | Validated | Observable in every session. Measurable quality difference. |
| Four harness types cover all AI knowledge | Validated | 18 instances, 6 apps, no fifth type needed. |
| Three-layer separation is novel | Validated | No competitor has this. Verified against CrewAI, LangChain, AutoGen, Dify. |
| MCP as mesh protocol | Validated | Industry standard. 97M+ monthly downloads. Universal adoption. |
| Outer harness outlives inner harness | Validated | Same knowledge, different agents. Proven across 3 agent types. |
| Configs are portable and forkable | Novel | Demonstrated via cortex.ai tenants. Not yet tested externally. |
| Compound learning across meshes | Unproven | Schema exists. Pipeline not built. Zero measurements. |
| Economics improve with scale | Unproven | 2 data points. Positive trend. Not statistically meaningful. |
| Methodology works for other people | Unproven | Single creator. No external validation yet. |
| This is "AI engineering" | Speculative | Category doesn't exist yet. May contribute ideas, unlikely to define it. |
Research & Landscape
What exists in the market, what's adjacent, and where harness.os fits — based on actual research, not just claims.
Industry Validation: Others See the Same Problem
KPMG — "Knowledge Engineering for AI"
KPMG has built an entire consulting practice around "knowledge engineering" — structuring organizational knowledge so AI agents can use it effectively. Their thesis: the companies that structure their knowledge best will get the most value from AI. This directly validates harness.os's core insight.
The difference: KPMG sells consulting hours to do this manually for enterprises. harness.os is a methodology I created for myself, with a schema that makes the structuring systematic and repeatable. I'm sharing it because the approach might work for others too.
What this means: The problem is real and large enough for a Big 4 firm to build a practice around it. harness.os's approach (structured methodology instead of consulting) is differentiated but unproven at enterprise scale.
MCP Ecosystem — Protocol Validation
MCP (Model Context Protocol) has exploded since its launch. Key numbers:
- 97M+ monthly SDK downloads (npm + pip combined)
- Adopted by OpenAI, Google DeepMind, Microsoft, AWS, Meta
- Streamable HTTP transport now production-ready (replaces SSE)
- Every major IDE (VS Code, JetBrains, Cursor) supports MCP natively
- Growing faster than GraphQL at same stage of lifecycle
What this means: Building on MCP was the right bet. The protocol will be around for years. harness.os's mesh can connect to any MCP-compatible client — which is becoming everything.
BPM & Process Mining — $16B+ Market
Business Process Management and process mining (Celonis, UiPath, Bizagi) represent a massive existing market built on the exact premise that structuring and improving processes creates business value. harness.os's "process improvement" framing isn't new — it's proven.
What IS new: applying it specifically to AI agent knowledge. Traditional BPM structures workflows for humans. harness.os structures knowledge so AI agents can participate in process improvement.
What this means: The market validates the problem and willingness to pay. harness.os extends the concept into AI — a natural evolution that BPM vendors will likely pursue too.
Memento-Skills — Persistent Agent Learning
Research into persistent agent memory (Memento-type approaches, skills-based agent learning) shows the AI research community is converging on the same insight: agents need persistent, structured knowledge to improve over time. Not just conversation memory — structured learnings that transfer across contexts.
What this means: harness.os's learning tables and transferability scores are aligned with where the field is heading. The specific implementation (relational tables with transferability_score) may be ahead of the research.
Detailed Competitive Landscape
| Platform | What It Does | Overlap with harness.os | What harness.os Does Differently | Threat Level |
|---|---|---|---|---|
| CrewAI | Multi-agent orchestration. Role-based agents with task delegation and sequential/parallel workflows. | Agent roles, task decomposition, multi-agent coordination. Similar to harness.os's pipeline + phase system. | No persistent knowledge. No methodology layer. No type system. Agents start blank every run. Code IS the config. | Medium — could add persistence. But would need to invent the methodology layer. |
| LangChain / LangGraph | LLM framework. Chains, agents, memory, tools, RAG. LangGraph adds graph-based workflows. | Tool integration, memory, workflow orchestration. RAG is a form of knowledge retrieval. | Toolkit, not methodology. Knowledge in code, not composable. Memory per-conversation, not per-domain. No type system. | Medium — massive community. Could absorb similar ideas. But it's a toolkit philosophy, not a methodology. |
| AutoGen (Microsoft) | Multi-agent conversation framework. Agents talk to each other to solve tasks collaboratively. | Multi-agent collaboration, role specialization. | Research-oriented, conversation-first. No persistent learning. No config/mesh distinction. No multi-product composition. | Low — different paradigm (conversation vs knowledge mesh). Microsoft could pivot, but hasn't. |
| Dify / FlowiseAI | Visual AI workflow builders. Drag-and-drop pipeline design. RAG, agents, tools. | Pipeline design, tool integration, knowledge bases. | Single-tenant, single-app. No knowledge mesh. No cross-domain reasoning. No compound learning. Visual-first vs methodology-first. | Low — targets different user (no-code AI builders vs systems architects). |
| Notion AI / Guru / Glean | Knowledge management with AI. Search across company docs. AI-assisted writing and Q&A. | Knowledge structuring, cross-domain search, persistent knowledge stores. | These are knowledge RETRIEVAL. harness.os is knowledge STRUCTURE + REASONING + LEARNING. Documents vs relational schema. Read-only vs read-write-learn. | Low — different layer. Could be data sources TO a harness, not competitors. |
| Celonis / Process Mining | Extract process patterns from system logs. Visualize, optimize, automate business processes. | Process improvement, learning from execution, continuous optimization. | Process mining is retrospective analysis. harness.os is prospective: agents USE knowledge during execution, LEARN from it, and IMPROVE future executions. Different time orientation. | Medium — if they add AI agents that learn and execute, they'd have budget + customers + data. Watch this space. |
| harness.os | Three-layer AI knowledge platform. Methodology + portable configs + running mesh instances with persistent learning. | — | The combination: typed knowledge + persistent learning + three-layer separation + scale-invariant methodology. No one does all four. | Self — the biggest threat is not building fast enough. |
What Could Kill This
AI models get so good they don't need knowledge
If future models have perfect memory, perfect reasoning, and perfect context windows — the outer harness becomes less valuable. Context windows have grown from 4K to 1M+ tokens in 2 years.
Counter: Even with unlimited context, structured knowledge outperforms raw context. A database query is always faster than "find this in 1M tokens." And domain-specific LEARNED knowledge doesn't exist in training data.
A well-funded competitor builds this better
CrewAI ($18M Series A) or a new startup could build the three-layer model with a real team, better DX, and marketing budget. The methodology isn't patentable.
Counter: The methodology's value is in the accumulated knowledge and configurations, not the code. First-mover advantage in knowledge accumulation IS the moat. But only if you move fast enough.
It's too complex to explain
Three layers, four types, configs, meshes, branches, CNS schemas... If you can't explain it in 60 seconds, most people won't try it. Complexity killed many good frameworks.
Counter: Kubernetes is complex too. The 60-second pitch works: "AI agents that remember everything and share knowledge across apps." Lead with outcome, not architecture.
Solo developer can't maintain 6 apps + methodology
One person building 6 consumer apps, a SaaS platform, a personal assistant, AND a methodology framework. Something will break. The question is what breaks first and whether it matters.
Counter: The three-layer model is the answer. Products are compositions — they share the same config and methodology. But the counter needs to be proven at the app quality level.
Engineer's Take
What a senior engineer reviewing this system would validate, what they'd question, and what they'd want to see next.
What They'd Validate
"The architecture choices are sound"
PostgreSQL + MCP + Neon branching is a defensible stack. Not exotic, not over-engineered at the infrastructure level. Any senior backend engineer can understand and contribute to this immediately. The hexagonal architecture in build.ai follows well-established patterns.
"The type system makes intuitive sense"
Build / Product / Operations / Domain maps cleanly to how real organizations work. An engineer hearing this for the first time would nod, not argue. The renaming from "process" to "operations" was the right call — it eliminates the genus-as-species confusion.
"The session lifecycle is well-designed"
Start session → get context → work → log decisions/learnings → end session with handoff. This is clean, stateless, and composable. It works for CLI-spawned agents, API agents, and human users. The lifecycle is the methodology's strongest implementation detail.
"The inner/outer separation is the right abstraction"
Decoupling knowledge (outer) from execution (inner) means you're not locked into any model or agent framework. This is good engineering — it's the same principle as separating data from presentation. Any engineer who's survived a framework migration would appreciate this.
What They'd Question
"Do you actively use all 37 tables?"
A senior engineer would immediately grep for which tables have recent writes. If half are empty or aspirational, the schema is over-designed. The right answer is honest: some tables are load-bearing (knowledge_chunks, rules, learnings, sessions), some are placeholders for future features (mesh_events, harness_budgets). Acknowledge which is which.
"Is this a methodology or a personal workflow?"
The hardest question. A methodology should be teachable, repeatable, and produce similar results for different practitioners. harness.os has only been used by its creator. Until someone else follows it independently and succeeds, it's technically a personal system — a very well-organized one, but still personal.
"Where are the metrics?"
Claims about "compound learning," "agents getting better," "knowledge accumulating" — engineers want graphs, not adjectives. Time-to-first-useful-output per session over time. Learning count per harness per month. Agent success rate by harness maturity. Without these, the compound claims are anecdotal.
"What's the DX like?"
Developer experience. Setting up a new harness from scratch — how long does it take? Is it documented? Are there CLI tools? Templates? Or do you need to understand the whole methodology first? The gap between "I get it conceptually" and "I can do it myself" is where frameworks die.
"What happens when MCP changes?"
MCP is young and evolving rapidly. Streamable HTTP replaced SSE. The protocol could change again. How much of the mesh depends on MCP specifics vs generic tool-calling? The answer (thin MCP layer over standard Postgres) is good, but the migration path should be clearer.
"Is three layers genuinely different from any other layered architecture?"
A skeptical architect would say: "Every mature system has principles, configs, and runtime. You've just named them." The answer needs to be sharper than "we separated them" — it needs to show what becomes POSSIBLE because of the separation that ISN'T possible without it. Configs as forkable AI strategies is the strongest example.
What They'd Want to See Next
To believe the methodology:
- A "Getting Started" guide someone else can follow
- A second independent user successfully creating a config + mesh
- Documentation that doesn't require reading this entire document first
- Clear answer to "why not just use CLAUDE.md files?"
To believe the economics:
- 5+ cortex.ai tenants with tracked setup-time per tenant
- Cost-per-mesh declining curve with real data
- A dashboard showing compound learning metrics
- Evidence that cross-mesh learning produces measurably better outcomes
To believe the architecture:
- Table usage audit (which of the 37 tables have >0 rows)
- Latency numbers for harness context retrieval
- Branch scaling test (what happens with 50 Neon branches?)
- Clear migration path if MCP evolves incompatibly
To join the team:
- Evidence of user traction beyond the creator
- A clear 12-month roadmap with milestones
- Understanding of which pieces to build vs buy vs partner
- Honest assessment of what one person can realistically ship
The Bottom Line
harness.os is architecturally sound, genuinely novel in its three-layer separation, and aligned with where the industry is heading (knowledge engineering, persistent agent learning, MCP as protocol). The four-type knowledge system is clean and practical. The core insight — outer harness matters more than inner harness — is correct and defensible.
The biggest risks are execution speed (one person, six apps), methodology portability (untested by others), and compound economics (unproven at scale). These are all solvable problems — but they need to be solved with evidence, not claims. Build the metrics pipeline. Get a second user. Ship one app to high quality rather than six to prototype quality.
The question isn't whether the ideas are good — they are. The question is whether one person can execute on them fast enough before the market catches up. The three-layer model is the answer to that question too: it's designed to make one person effective. Now prove it.