Innovating how real work gets done with agentic AI.

Led the design for how work gets defined and run against real tools in an AI system for small to mid-sized businesses, to enable user trust and enable control using A2UI and human-in-the-loop approvals.

Specific product surfaces are abstracted under NDA. What follows is the thinking and the patterns rather than the product itself.
Assignments Supplier Invoice Reconciliation Run #42 Held · step 3 of 5
FAFinance Agent· 9:00
Pulled supplier invoices from Xero· 42 invoices · 14s
Grouped by furniture type and matched to the ledger· 38 matched · 4 held · 38s
!
Resolving mismatches· 3 of 4 resolved
Three of the four held invoices were zero-rated assembly parts, which your VAT rule already covers, so I cleared those. The last one is a sofa order from Oakline Furniture where the VAT billed is higher than the purchase order allows, so I read the order to check before doing anything with it.
read · INV-4821 · Oakline Furniture compute · expected VAT vs. invoice VAT
INV-4821 needs your call before I post the batch
Oakline Furniture billed VAT at 20% (£206.67), but PO-441 caps the line at 16.6% (£170.67). Pick one and I'll finish posting the remaining 41 entries.
Post journal entries to NetSuite
Notify the team in #supplier-ops
A supplier invoice reconciliation run for a home-furniture retailer, abstracted. The plan advances as the agent works and holds when one invoice needs the person's decision. (Note: This is an animated mock of the real product styling.)
01 · Overview

How do you give an AI real work to do, and still keep the person in control?

At Brim I design how people hand work to an AI and stay in control of it once it is running. Brim itself is an AI system that helps businesses own their specific intelligence, meaning the agent will continuously learn and run real tasks accordingly to the way their company works.

This case study is about the assignment, which is the unit of work a person and an agent share on Brim, and the way a piece of work gets defined, run, and held accountable between them. A lot of my job was designing for that while keeping the person in a position to trust and oversee what the agent actually does.

Role
AI Product Designer
Timeline
February to May 2026
Team
Founder, three engineers, one product designer (me)
Skills
Interaction design, system design, product thinking, user research, frontend, prototyping, Figma, Claude Code

Outcome

Defined how assignments are run on Brim, so a person can see what an agent is doing with their real tools and data and step in at the moments that matter.
02 · TL;DR

I designed the way work gets defined, run, and handed back to the person.

The challenge

People don't easily trust AI, and that becomes very real the moment the work involves their actual tools, because the agent is pulling in their data, holding permissions to their accounts, and acting on their behalf. An assignment had to make that feel safe rather than opaque.

What I did

I worked closely with the CEO and the engineering team to define how an assignment gets run, so the person keeps control and oversight the whole way through. What the agent is doing and what it produces is shown clearly and in real time as it happens, rather than handed over as a finished result they have to take on faith, and the person has the power to make a decision at every checkpoint, after each step in the plan, before the agent carries on.

The outcome

In more than 20 feedback sessions with our early users and customers, what they responded to most was how much shorter the work became, where something that used to take real effort now mostly happened on its own while they could still see and check it.

03 · Problem

How do you trust a workflow you didn't build, that runs a little differently each time?

People don't easily trust AI to do their actual work, especially once it is using their real tools and data.

As AI has become more capable and more accessible, the user describing the work is often not technical and doesn't want to map out steps, they just want to say what they need and see it happen. Underneath, the system plans and runs the work itself, so it can approach the same task a little differently each time.

That left me with a specific problem to solve. When the user isn't the one building the workflow, and there is no fixed set of steps for them to rely on, and the work involves their real tools, their data, and permission to act on their behalf, how do you give them enough visibility and control to actually trust it. I started treating the assignment, the shared unit of work between a person and an agent, as the place this had to be answered, because the assignment is where work gets defined, where it runs, and where it comes back to you.

04 · Research

Looking into new ways to visualise workflow automation.

Before designing anything, I wanted to see how this kind of work was handled before AI, and the honest answer was that you built it yourself, wiring each trigger and action together by hand in tools like Zapier and Bardeen. The more I sat with those examples, the more it became clear I had to come up with something quite different, because I was designing for two shifts at once, for agentic AI, where the system plans and runs the work itself rather than following a fixed script, and for a new kind of dynamic experience, where the interface is being composed in response to what is needed rather than drawn ahead of time. That meant the patterns I was used to, the wiring diagrams and the static screens, could only take me so far.

The old way · wire the graph yourself, in a tool like Zapier
X
Trigger
Xero · New supplier invoice
+
N
Find data
NetSuite · Look up the PO
+
Z
Paths · conditional
Split on VAT vs PO cap
Path A · VAT ≤ cap
N
Action
Create journal entry
+
S
Action
Post to Slack
Path B · VAT > cap
Z
Action
Hold the invoice
+
S
Action
Notify for approval
The new way · describe it, the agent plans it
Reconcile my supplier invoices each morning, then group them by furniture type.
BOkay, here's the plan I came up with. It uses your connected tools, like Xero, to reconcile your invoices.
Plan
Every morning · 9:00am Grouped by furniture type
1Pull this week's supplier invoices from Xero
2Group them by furniture type, then match to the ledger
3Hold anything billed above its PO cap for you
4Post the cleared entries and summarise in Slack
A design exploration of the shift. On the left, the same work built by hand in Zapier, where adding one condition already splits it into branching paths. On the right, the person describes the outcome and the agent replies with a plan it has worked out from their connected tools, which they can approve or edit. (Mock; Zapier interface abstracted.)
An early wireframe of a node-based workflow builder, an abandoned direction for assignments.
Early FigJam ideations of the node-based workflow builder, from before AI was part of how I designed.

I also looked at how today's chat tools handle this. Even when a tool like Claude can run something on a schedule, the result just lands back in the chat, with no clear way to see what it actually did. That is fine for one person experimenting on their own, but it doesn't hold up inside a business, where the same piece of work has to be checked, shared, and answered for. What I kept coming back to was keeping the thread simple and giving every run a side panel that logs it, an audit trail that changes from run to run and that a person, or their manager, can open and verify.

In a chat tool, the run lands back in the chat
Claude Scheduled
Every Monday, find new leads from Apollo and Sales Navigator and add them to HubSpot.
Here are this week's leads. I found 18 contacts across Apollo and LinkedIn Sales Navigator that match your ICP, and a few look especially promising. Want me to add them to HubSpot?
Reply to Claude…
Claude can make mistakes. Please double-check responses.
In Brim, the thread stays simple and the panel logs everything
Brim
SASales Agent
Done. I pulled 18 leads from Apollo and Sales Navigator, matched them against HubSpot, and added the 15 that were new. 3 need your check.
Run log
Find leads18 found
Apollo & Sales Navigator
MC
Maya Chen · VP Ops
Northgate · 92% ICP match
TR
Tom Reyes · Head of Sales
Harbour · 88% match
+16
16 more leads
matched to your ICP
Add 15 new leads to HubSpot3 already there
A scheduled assignment in Claude versus the same work in Brim, this time researching sales leads. In Claude the result lands back in the chat, with no record of what it touched. In Brim the thread stays short while the side panel logs each step and keeps what it pulled, here the leads it found across Apollo and Sales Navigator before adding them to HubSpot, a record that changes every run and that a manager can open and verify. (Mock; Claude interface abstracted.)

Brim had been developing for around a year by this point, but assignments were still a work in progress, taking shape alongside several other features the team was building. A lot of the customer conversations happening around the same time weren't specifically about assignments, though the patterns I kept hearing about how people want to work with AI carried over when it came to designing them.

05 · Principles

So I set five rules for how an assignment should behave.

These shaped how every part of an assignment is designed, with the first run especially in mind.

01

Every assignment has a clear goal.

An assignment is defined by the outcome it is trying to reach, which the person describes as intent, so before anything runs both the person and the agent know what finished is meant to look like.

02

On the first run, the person approves at every checkpoint.

The first time an assignment runs, the agent pauses at each step in the plan and waits for the person to approve before it carries on, so nothing important happens without them. Over time this can relax as the agent earns the right to run more of the work on its own.

03

Trust comes from seeing the real work, as it happens.

While an assignment runs, the person can watch it work in real time through a live step rail that shows each step the agent is taking, the tools it is reaching into, and the data it is pulling in.

04

The work stays in a record you can go back to.

Every run keeps a full log of what the agent did, where the person can open any step and see the real files it retrieved and produced. The record stays there after the run finishes, so the person can look back over what happened and adjust the plan for next time.

05

The work comes to the person.

Because people won't always log in to check, an assignment reaches them where they already are, through notifications, email, and schedules. The agent can also start a run on its own when it notices new activity in a connected tool.

06 · Solution

One assignment, from the sentence that starts it to the moment it needs you.

The five principles come together in how a single assignment actually runs. To make that concrete, I'll walk through one, a supplier invoice reconciliation that an agent runs for a home-furniture retailer, from the moment it is defined to the moment the work reaches the person. The run at the top of this page is that same assignment in motion.

01 · Defining the work and connecting its tools

An assignment starts as intent. The person describes the outcome they want, and the agent works out the setup before proposing anything, asking where the work comes from and where it should go. I wanted defining work to feel like briefing a colleague who asks a couple of sensible questions, and for connecting the tools to happen naturally inside that conversation rather than as a hidden setup step.

New assignment
Reconcile my supplier invoices each morning and group them by furniture type.
BSure. First, where do your invoices come in — is that Xero?
Yes, Xero.
BAnd once they're reconciled, where should I post the summary?
Our #supplier-ops channel in Slack.
BGot it. Here's the plan I'd run — tweak anything before I save it.
Proposed plan
1Pull this week's supplier invoices from Xero
2Group them by furniture type — sofas, beds, dining, storage
3Match each against the ledger and the purchase orders
4Hold anything billed above its PO cap for you
5Post the cleared entries and summarise in #supplier-ops
Xero NetSuite Slack
Defining the work as a conversation: the person says what they want, the agent asks where the invoices come from and where the summary should go, and only then proposes a plan and the tools it needs, which streams into the panel for them to approve or adjust. (Mock of the real product styling, abstracted.)

02 · The run, made visible

Once it runs, the assignment becomes a live thread. The agent streams in what it is doing as it does it, in plain language, with the real tool calls shown underneath each step.

FAFinance Agent· 9:00
Pulled the week's supplier invoices
Pulled the 42 invoices raised since the last run on Mon 11 May, ready to sort and reconcile.
connect · Xero (ops@companyx.io) query · Xero — invoices since 11 May 09:00 (42)
Grouped by furniture type and matched to the ledger
Sorted the 42 into sofas (18), beds (9), dining (8) and storage (7), then matched each against the ledger by vendor, amount and VAT. 38 cleared, 4 set aside.
read · ledger-mapping.csv (312 rows) group · by furniture type, then match
The agent streams in each step as it works, in the way it would write a tool call, with the real calls shown underneath. (Animated mock of the real product styling, abstracted.)

03 · The work comes to the person, when they need it

The finding that people often weren't logging in shaped this part the most. When a run reaches something that needs the person, the same moment goes to where they already are, in the same words wherever it shows up, so a decision the agent can't make on its own never sits waiting unseen.

Assignment run thread
FAFinance Agent· 9:01
!
Resolving mismatches· 3 of 4 resolved
One sofa order from Oakline has VAT above its purchase order, so I've held the batch for your call.
INV-4821 needs your call before I post the batch
INV
INV-4821 · Oakline Furniture
3-seat sofa order · PO-441
£206.67 · 20%PO cap £170.67 · 16.6%
AI feed
Awaiting review9:01
3 invoices need your review
VAT above the PO cap on INV-4821 (Oakline), INV-4830 (Harbour Home) and INV-4844 (Velvet & Vine), held across this week's runs.
Supplier Invoice Reconciliation FAFinance Agent
Desktop notification
Brimnow
INV-4821 needs your call
Oakline's VAT is above the PO cap, so I've held the batch for you.
Email
B
Brimnotify@brim.ai
to you
9:01 AM
INV-4821 needs your call — VAT above the PO cap
Oakline Furniture billed VAT at 20% (£206.67), but PO-441 caps it at 16.6% (£170.67). I've held the remaining 41 entries until you decide.
The held invoice, the same moment, across four surfaces: the assignment run thread where the person acts on it, the AI feed where it sits alongside everything the agents have done, a desktop notification, and an email. The title and the source stay the same wherever it appears, which is what keeps the notifications from becoming noise. (Mock of the real product styling, abstracted.)

04 · When the agent needs you, it shows its working

When the agent finishes its part, the items that need a person's decision sit right inside the conversation. Each one expands to show the specific invoice, the rule it broke, and a suggested action, so the person can read the reasoning in context and act without having to dig for the original document.

IAInvoice Agent· 9:14
  1. Extracted 5 attachments from approved emails
  2. Applying Invoice Processing Skill to 5 invoices
  3. CIS check complete, 2 flagged, 3 clean
  4. Approver routing complete

Done processing. I found 2 invoices that need your attention, one has a CIS/VAT error and one is missing a job number. The other 3 look correct. Review and action each one below.

2 flagged invoices Pending Approval
Marlow & Finch — Invoice 3351
CIS error detected
Invoice amounts
£2,000.00 · + VAT £400.00
Total £2,400.00 ×

VAT of £400 has been added to the total on a CIS subcontractor invoice. CIS invoices must not include VAT in the total.

Suggested action: Request corrected invoice from Marlow & Finch with VAT removed from total.

Northgate Supplies — Invoice NS-9921
No job number found
Brookfield Trade — Invoice BT-8812
No flagged items
When the agent finishes processing the day's invoices, the items that need a person's decision sit inside the same conversation. The side panel keeps the assignment's progress and outputs in view, so the person can act on what is flagged while still seeing the full plan. (Mock of the real product styling, abstracted.)
07 · Feedback

Taking the prototype back to the people who would use it.

Once the prototype was in shape, I went back to a handful of our early customers and showed it to them against the work they actually wanted to use it for. I wanted to see whether the way assignments behaved made sense when the agent was running against their tools, their data, and their own kind of work. A few things came up again and again.

Synthesis · what kept coming up
InsightPeople often weren't logging in to check on their assignments at all.
TrustTrust came from seeing the work as it happened, not from being told it was done.
AuditThe record of what the agent did mattered most, somewhere to go back and verify.
GovernanceA chat history isn't something a team or a manager can share, review, or answer for.
Synthesising the feedback sessions into the things that came up again and again. (Anonymised.)

Those four pulled in the same direction, towards work the person can see, a record they and their managers can trust and share, and an assistant that reaches them rather than waiting to be checked on. They confirmed the direction the assignment was heading and shaped the parts I went back to refine.

Beyond Brim

None of this is really specific to Brim. Any AI tool that acts on someone's behalf has to solve the same thing, which is how a person stays in control of work they cannot fully predict, and that is going to be true of most of the tools we use over the next few years.

08 · Learnings

People will lean on AI for real decisions, so it has to earn their trust.

What worked

The thing I keep coming back to is that as AI gets better, people are going to hand more of their decisions over to it, so the work worth doing is making that feel trustworthy. What I noticed across the design is that trust gets earned, and the shape of how it gets earned turned out to be specific. It is earned when the person can see what is running as it happens, when the agent shows its reasoning for what it has done or is about to do, and when the actual files it is reading and producing stay visible rather than getting tucked behind a polished result. Holding the first run to an approval at every checkpoint turned out to be the thing that let people relax into it later, and the through-line, that the agent has to make itself legible to the person, is the same thread I keep pulling on in the design system case study next door.

What I'd do differently

I'd get into user testing earlier and more often, since a lot of what I learned about how people want to watch an AI work came from feedback I could only have gathered by putting real runs in front of them sooner. The other piece we haven't fully worked out is how separate assignments tie together when they are really part of one larger piece of work. Chaining them is something we are still developing, and it isn't entirely clear yet, which is honestly one of the more interesting problems left to solve.

Prototype, case study 01