Agent UI for product designers is a workbench, not a chat box.
Why chat is the wrong default
Every agent product ships a chat box first because chat is the cheapest interface to build and it demos well. For product design work it fails in a specific way: design work produces objects. Screens, tokens, components, research notes, review decisions. A chat thread buries objects in prose, and three days later the only way to find a decision is to scroll.
The opposite failure is the dashboard that shows every subsystem at once, and Studio flirted with that too. What follows is the middle we actually built, commit by commit, and what each piece replaced.
The triad landed in one day
The structural answer arrived on 2026-05-12 in three commits: the compact run spine (commit beae75c), agent action receipts (3be0924), and the right pane becoming a contextual inspector (e5faae2). Composer, spine, inspector. That triad is still the whole default screen.
The composer holds intent and agent choice. The spine is the trace: prompt, plan, tool calls, files changed, result, one vertical line you can scan. The inspector renders whatever you select in the spine, and nothing when you select nothing. Each tool call is a receipt that opens into full detail, so the designer scans the run first and inspects the exact command or output only when it matters. Text volume is not transparency. A receipt with an exit code beats a paragraph explaining one.
Design work gets a lane, not a thread
The day after the triad, design work stopped living inside the conversation. Commit 7af42a7 (2026-05-13) added a contextual design lane in a 1070-line change across App.tsx and workbench-components.tsx. Plans, design output, and review surfaces appear in a lane with a fixed place on screen instead of interleaved between chat turns.
The reason is mundane and decisive: designers revisit. A decision made on Tuesday gets questioned on Thursday, and in a thread that means archaeology. In a lane it means looking left. The chat metaphor treats everything as ephemeral conversation. Design work is mostly not ephemeral.
Critique starts from a screenshot
“Critique my screen” typed into a chat box gets you plausible generic feedback, because the model is critiquing its idea of your screen. On the 2026-05-26 hardening day we made the critique starter screenshot-backed (commit 3a0cd1d): starting a critique captures what is actually rendered and hands the agent that. A companion change made Codex screen audits context first (2adf312), so the audit reads the real surface before it opines.
This was the cheapest credibility win in the whole app. The same prompt, grounded in an actual artifact, stops hedging and starts pointing at coordinates.
Product work has more artifacts than code
A coding assistant can treat the diff as the main artifact. A product-design assistant accumulates more kinds: design-system notes, research summaries, component decisions, comments, review state, PM board source, Figma context.
Two properties matter. First, these have to be first-class objects the inspector can open, not prose in a transcript. Second, they have to be editable, because product judgment changes. The artifact review surface ships with literal OK and Fix buttons per section, a review-state select (Unreviewed, Looks good, Needs work), and editable section summaries. Memory the designer cannot correct is not memory, it is a liability that compounds.
Contextual beats permanent
The temptation with every capable subsystem is a permanent panel. We went the other way on 2026-05-26: the research lab harness became contextual rather than a fixture (commit d9ad4cf), and checks and verification runs came out of the default workbench entirely (3aa52e7, 14071a1).
The visibility rule we converged on: show a lane when an artifact exists, when a bridge is connected, when a slash command asks for it, or when the selected run produced that kind of output. Everything else stays a command away. Figma, FigJam, research, and the advanced harnesses all matter, and none of them belong on first launch.
The quiet parts had to be honest too
The least glamorous work was the idle state. Early builds filled empty panes with placeholder copy that implied activity, and the inspector chattered when nothing was selected. On the hardening day the idle inspector copy was quieted (commit b07d18a) and idle states were rewritten to say what is actually true (e9e3199).
This is enforced, not aspirational: the assert-no-mock-ui CI gate bans placeholder strings by exact match, including “No tool calls yet.” A pane that has nothing real to say now says nothing, and the build fails if someone re-adds comforting filler. The same day also made conversation goals accessible (40bb1c1), because a designer-grade bar includes the people using it with a screen reader.
The target feeling
What all of this adds up to: choose the worker, give intent, watch receipts, inspect the object that changed, decide what memory survives. Five verbs, none of them “scroll.”
The chat box optimizes for the first message ever sent. The workbench optimizes for the four-hundredth run, where what you need is not a friendlier text field but a bench where every object the agent touched is still sitting where you left it.