Design Systems · Context Engineering

A design system AI agents can actually read

I evolved Brighte's design system into a machine-readable source of truth, so AI coding agents and engineers can build consistent, on-brand UI from a simple brief instead of guessing at it. It was my idea, and I led it.

Role: Product Designer (design-system custodian)
Company: Brighte
Timeframe: Late 2025 to ongoing

Context

Brighte's component library already worked for the humans using it. But the way the team ships changed fast: engineers working with AI coding tools, PMs prototyping, and me shipping my own front-end pull requests. All of that put a new kind of user in front of the design system, a very literal one. An AI agent doesn't absorb conventions by osmosis the way a new designer does. It reads what's there, and if what's there is ambiguous, it fills the gaps by guessing.

As design-system custodian, I saw that as a design problem, not a tooling problem. A design system is a product judged by its users, and we had a user it was failing.

The problem

Point an AI coding agent at the system as it stood and the output was inconsistent and barely usable: wrong tokens, invented variants, structure that drifted from one attempt to the next. Essentially throwaway code that needed correcting everywhere. Most people run into that, conclude AI is overhyped, and stop. I read it differently: the model wasn't the problem, the missing context was. There was no reliable source of truth for an agent to work from, so it approximated.

My role

The idea was mine and I drove the work, as Product Designer and custodian of the design system, working hand in hand with engineering. I'll make the same point here I make about all shipped outcomes: the working system belongs to the team that built and uses it. What I'd claim personally is the reframe, the semantic layer's design, and the validation approach.

Approach

I started the way most people do: stuffing all the instructions into a long prompt. That actually got consistency, but a prompt that long is unrealistic, because no one writes a brief like that. The real move was taking the instructions out of the prompt and building them into the system itself, so an agent routes itself through the right context without being told to.

Concretely, that meant a context and semantic layer on top of the existing system. Every component gets a specification documenting which tokens to use, how it composes, and, critically, what an agent must never generate. Above the components, pattern rules describe how they assemble into whole layouts for a given surface. An agent reads the layers in sequence, surface rules, then pattern, then the specific components, and arrives at the same grounded output every time instead of inventing one.

That shift, from prompting the model to engineering the context it reads, is the whole difference between using AI and building with it.

What shipped

The semantic layer itself, live over the design system the team builds with. Code Connect in Figma, which I set up so the design system in code and the one in Figma actually line up. And the working habits around it: I've helped engineers upskill in working with tools like Claude Code, helped PMs get set up to prototype effectively, and done a lot of the experimentation on when it's best to work from Figma, straight in code, or in a prototyping tool. The system only matters if the team actually works this way, so the enablement was part of the work, not an afterthought.

Outcome

I validated it properly rather than eyeballing it. The test was consistency: run the same brief through several independent AI agents in fresh sessions, no shared context, and measure how alike the outputs are. Structure, component usage, and token accuracy held across all of them. Set against the old system's throwaway output, the same prompts now produce something close to draft feature code, on-brand and working the first time, with far less rework.

I want to be honest about the shape of that evidence: it's a consistency benchmark plus daily use in a shipping team, not a formal time-and-motion study. I cared about measuring it at all because "it felt better" isn't evidence, and a consistency threshold is.

Reflection

The scar I carry from this work is a migration where we leaned too hard on AI to carry it and under-invested in investigating the existing backend behaviour up front. The UI and the migration itself went fine, but a few connecting behaviours weren't well documented, so we shipped with a list of fast-follow fixes and some stakeholder concern. The lesson was blunt, and it's the same belief stated harder: front-load the planning, investigation and documentation, because the problems you hit downstream eat the productivity the AI gave you.

The seam that still frustrates me: the design-side AI and the code-side AI don't yet share a source of truth, so a genuinely good prototype still can't be promoted straight through to production. I expect the space to close that gap, but it's the edge I keep pushing on.

What this proves

The pattern generalises well beyond components: find where the AI is guessing, work out what context would stop it guessing, and build that context as a product. Design systems are just the clearest instance, because they're already supposed to be a source of truth. This is the work that sits closest to how I think: systems over one-off fixes, measured rather than vibes, and built so the whole team moves faster, not just me.

Want to go deeper?

Ask the site about this project. It answers from the real work - what shipped, what didn't, and how I'd do it again.