Inside the memi harness.
There is no service
Mémoire is a harness, not a service. The CLI (memi), the macOS Studio app, and the MCP server all call the same TypeScript internals, every project gets a .memoire/ workspace as the single source of truth, and every run appends a JSONL trace to .memoire/.agent-bus/runs/<run-id>.jsonl that you can replay. That architecture has been stable since v0.17.
What is worth writing down is the part that was not obvious at the start: how the harness itself ended up being defined. Not as code. As data.
The temptation: eight piles of glue
Studio runs eight different agent harnesses: Memoire Native, Claude Code, Codex, OpenCode, Gemini CLI, Ollama, Hermes, and a plain shell. The obvious way to support eight agents is to write eight integrations. Spawn this binary with those flags. Scrape that output format. Special-case that login error. Each one a private pile of glue inside the app, each one slightly different, each one rotting at its own speed.
That approach also makes a quiet promise you cannot keep: that the app code knows, forever, how every third-party CLI behaves. It does not. Those CLIs change under you.
One manifest instead
What ships is a single file: harness-manifest.json, schemaVersion 2. All eight harnesses are entries with the same fields.
Each harness declares its commandTemplates (how to invoke it), an install probe and setup steps with copyable commands (npm install -g @anthropic-ai/claude-code, codex login), and an authProbe: a real command like claude auth status or codex login status. Readiness in the UI is probed, never assumed. The status dot is honest because a process actually ran.
Each harness also declares knownFailurePatterns: regexes like “not logged in|login|unauthorized” that map a raw failure to the specific setup step that fixes it. A blocked harness comes with a next action, not just an error string.
And each one names its outputParser. This is where the real integration cost lives: codex-jsonl, claude-stream-json, hermes-text, ollama, shell. Five stream dialects across eight agents, because nobody’s output format agrees with anybody else’s. Naming the parser in data forced us to treat each dialect as a contract instead of a regex sprinkled through UI code.
Finally, envPolicy and workspacePolicy say how each harness handles keys and file access. A provider-backed harness, a local model, and a raw shell should not share one environment story.
The split proved the shape
The data-not-code bet paid off the day Studio was carved out of the engine monorepo. On 2026-05-09, day one of the Studio repo, harness-manifest.json had to be vendored in (commit cd3dfda) the way an icon or a font gets vendored in. The harness definition is a shippable artifact that travels with the product.
That is also why the file carries a schemaVersion. The moment two repos read the same manifest, it stops being an implementation detail and becomes an interface. Version it like one.
Safety is a list, not an adjective
Every agent tool says “sandboxed” somewhere on its site. The word costs nothing. The useful question is: what, exactly, is refused?
The manifest answers with exactly 4 hardlineBlockedPatterns: recursive deletes of root or home, mkfs, dd onto raw block devices, and shutdown or reboot. Those four are blocked for every harness with no approval path, because they are the ways an agent can total a machine faster than a human can intervene.
Everything else is recoverable, so everything else is approvable: mutating Figma, running git, writing outside the workspace all gate on an explicit yes. The harness reads freely and writes after approval.
The learning: a short absolute list you can enforce and point to beats a broad claim you cannot. Four patterns sounds thin until you notice it is four more enforceable guarantees than “sandboxed” gives you.
Cost is a harness internal too
Users bring their own model plans, which means the harness spends their money. So Studio’s cost HUD landed in the repo’s first day of commits (commit 25753e9, 2026-05-09): it polls /api/usage every 3 seconds and shows tokens, dollars, and prompt-cache hit rate, color-coded. Fifty percent cache hits and up is green, twenty and up is amber, below that red.
Cache-hit rate gets first-class treatment because in agent workloads it is the difference between a cheap run and an expensive one. If the harness manages context well, the HUD shows it. If it thrashes the cache, the HUD shows that too, in red, every 3 seconds. Putting the number in the chrome keeps us honest about a thing users would otherwise only see on an invoice.
Get graded by someone who is not you
The MCP server was added on 2026-03-29 (commit fea3b141), six days after the modern engine’s first commit, exposing the workspace to Claude Code, Cursor, and anything else that speaks the protocol. Today that surface is 14 tools and 3 resources, and memi mcp config prints the snippet you paste into your client.
Two weeks later we added glama.json (commit 85a4a019, 2026-04-13) to enroll the server in an external MCP quality score, badge in the README. Submitting your tool to a third-party grader early is uncomfortable and useful for the same reason: it audits the parts you stopped seeing. If your harness is going to be infrastructure for other AI tools, let someone else measure it before your users do.
What to take from this
If you are building your own agent harness, the shape that survived here:
- Define harnesses as manifest entries with probes, parsers, and policies. Glue code hides differences; data makes you declare them.
- Probe readiness with real commands. Never render a green dot you did not earn.
- Make the hard safety floor a short, absolute, enforceable list. Route everything recoverable through approvals.
- Ship the manifest with the product and version its schema.
- Put cost and cache behavior in the UI, on a timer.
All of it is open. Read the manifest, the engine, and the traces yourself: github.com/sarveshsea/memi, github.com/sarveshsea/memoire-notes. If something in this post is wrong, the source is the source of truth. Tell me where; I will fix it.