The Studio run workbench is the product.

June 6, 2026 Sarvesh Chidambaram View source

Abstract spine rows motif in coral on dark graphite.

The shape we deleted our way to

The default Studio surface today is one composer, one run trace, one inspector. That sentence sounds like a design principle we started from. It was not. It was a deletion schedule, executed commit by commit over two weeks in May, and the most useful thing I can do here is show the receipts.

The first Studio builds tried to show everything, because the runtime genuinely does a lot: eight harnesses, Figma, Mermaid boards, research, logs, approvals, artifacts, automations, scenario simulation. Every subsystem argued for a panel, and most of them won. The result read like an admin dashboard for a system you did not yet trust, which is exactly wrong for the question a designer actually brings to the screen: what is the agent doing to my product right now, and can I trust it?

May 12: the spine lands

The skeleton arrived in one day. We simplified the status strip and sidebar (commit 47ab766, 2026-05-12) and introduced the compact run spine in a 300-line change to App.tsx (commit beae75c, same day). The spine is one vertical trace per run: prompt, plan, tool calls, files changed, artifacts. You read it top to bottom and you know what happened.

The same day, the right pane stopped being a fixed stack of panels and became a contextual inspector (commit e5faae2). It renders detail for whatever you select in the spine: a tool call’s full output, a diff, an artifact. When nothing is selected, it shows nothing. We also aligned the workbench brand with memoire.cv that day, so the app and the site stopped drifting apart visually.

What a receipt actually contains

Tool calls render as receipts, not raw log noise. A receipt is compact on purpose: command, working directory, duration, exit code, files touched, and token and cost data when the harness reports it. Full stdout and stderr live behind a details expander in the inspector.

This is the part people misread as hiding information. It is the opposite. A wall of interleaved log text is technically transparent and practically opaque. A receipt gives the evidence a shape, and the raw stream is one click away when the shape is not enough.

May 26: the honesty pass

Two weeks of living with the spine exposed the embarrassing part: some of what the workbench displayed was fake. Demo-era mock actions were still rendering in the trace, polished enough to pass a glance.

On 2026-05-26, the single biggest day in the repo at 48 commits, we deleted the mock workbench actions outright, 97 lines removed (commit 05741d3), and wrote scripts/assert-no-mock-ui.mjs in the same change. The gate is a CI script that enumerates forbidden strings per file. It bans the placeholder copy “No tool calls yet.” and the invented model id “codex-gpt-5-5” as exact strings. If anyone reintroduces a fake receipt, even as a well-meaning empty state, the build fails. Honesty stopped being a value statement and became a lint rule. Idle states were rewritten the same day to say what is actually true when nothing is running (commit e9e3199).

The same day fixed the staleness problem. Workbench context got scoped to the active session (commit c65fd72), stale runs were kept out of the active workbench (d3c6de0), and composer preset state started surviving across runs (54192d7). A trace from yesterday’s session quietly sitting under today’s prompt is its own kind of lie, and it took dedicated commits to stop telling it.

Quieting verification, on purpose

The least intuitive move of the hardening day: we reduced how much verification the workbench shows. Verification runs and checks came out of the default sidebar (commits 14071a1 and 3aa52e7), and verification history counts were quieted (e58f1f6).

We were proud of the verification machinery, so the instinct was to display it. But badge counts and check rows in the sidebar read as system noise to someone who came to review a design change. The verification still runs. It surfaces when a run needs review, not as permanent chrome. Trust came from the receipts being real, not from the quantity of meters on screen.

Primary versus advanced is data, not opinion

Codex and Claude Code are the primary harnesses: first-class setup, auth status, model controls, trace hydration. OpenCode, Gemini, Ollama, Hermes, and shell stay available but do not crowd the first screen. That split is encoded in the harness manifest as visibility metadata, primary versus advanced, rather than hardcoded into components. Changing the product stance later is a data edit, not a UI rewrite.

The flow is test-backed

One more thing kept the deletions safe. The designer run flow was streamlined on 2026-05-13 (commit 0614015) and the e2e workbench flows were repaired the same day (3cd11a6), so every compaction pass since has run against tests that walk the actual flow: open project, prompt, watch the spine, inspect. When the May 26 day removed 97 lines of mock UI and rescoped session context, the e2e suite is what said the real path still worked.

The workbench is not less capable than the dashboard it replaced. Everything still exists. But the default screen now answers one question, the receipts on it are guaranteed real by CI, and every piece of chrome that survives has earned the space by being needed mid-run. The assert-no-mock-ui gate has grown in at least seven commits since it was created. The deletion schedule, it turns out, never really ends.