AI Works Better with State Machines

start: streaming.running

~~Today is my birthday and I’m out of office, so why not write a bit?~~ (I ended up finishing this 10 days later. Adult life :) )

What is a state machine?

A state machine is a way to model behavior as a set of states, the events that can happen, and the transitions between those states. Instead of scattering logic across conditionals and callbacks, you make the allowed behavior explicit in one place.

If the idea is new to you, this short introduction to state machines is a good place to start.

State machines have caught my attention for years, especially since I started following David Khourshid on Twitter. His work on XState is impressive. I always felt this elegant, sometimes verbose way of modeling complex interactions was powerful, but I never made enough time to go deep on it.

But the value proposition has changed. In the age of AI, I do not need to write every line myself. I can focus on understanding states and transitions, then prompt the AI with clearer intent. That is what makes state machines especially interesting to me now: they make behavior readable, which makes complex systems easier for both humans and AI to reason about.

Recently at work, we ran into exactly this problem. An internal chat UI kept growing in features, states, and logic. It was getting harder to follow and test, edge cases kept creeping in, and new features forced large rewrites of supposedly stable paths, which was a clear sign of technical debt. After refactoring it into a chat state machine, we ended up with 8 parent states, 8 child states, around 26 event types, and about 50 transitions. The result was good enough that it made me want to write this article.

When a chat UI needs a state machine

If your chat is simple, you probably do not need one yet. But once the UI has to support several overlapping interaction patterns at the same time, complexity grows fast. None of these behaviors is unusual on its own. The challenge is making them all work together predictably in one system.

Human-in-the-loop tools, approvals, and blocking

When a tool call needs user approval, your UI is no longer just “streaming” or “not streaming.”

You need explicit “waiting for approval” states.
Input may need to be blocked until the user approves or rejects the tool call.
Approval/rejection must feed back into the same stream safely.

Message interactions while streaming

Actions like edit, delete, or regenerate become tricky during active output.

You may need to stop the current stream first.
Then perform the action.
Then return to a known state without losing chat context or pending actions.

Paused tools, polling, and resume

Some tools pause and require server-side progress polling before resuming. In our case, we have a deep research tool that can take up to 20 minutes. We do not want an HTTP SSE connection open that long, and the server could disappear during a new deploy.

The UI needs a distinct paused state.
Polling progress updates should not break normal chat flow.
Resume should be a first-class transition, not an ad-hoc callback.

Timeouts, retries, and state recovery

Chat systems fail in many ways: slow network, dropped streams, errors.

Define timeout behavior explicitly.
Model retry paths from error states.
Keep recovery predictable instead of sprinkling try/catch blocks across UI code.
It becomes easier to recover to a known good state or reconcile client and server state.

Idempotency and URL-driven behavior

Deep links, auto-submit prompts, and StrictMode can trigger duplicate actions.

Guard one-time effects with idempotency keys.
Treat URL prompt parsing and cleanup as explicit transitions.
Keep these flows deterministic and testable.

Why state machines?

For me, state machines are about making behavior explicit before implementation. You write down states, events, and transitions first. That forces clarity about happy paths, edge cases, and failure modes.

In our chat UI refactor, that clarity mattered a lot. The model was fairly large (multiple parent/child states, dozens of event types, and roughly 50 transitions), but it made complexity visible instead of hiding it.

You decouple definition from implementation. The machine becomes a behavioral contract, not just code.
You and the AI can reason quickly in small context windows, because state and transition rules live in one place.
Human-in-the-loop behavior becomes explicit: approval and pause flows are first-class states, not ad-hoc conditionals.
Reliability concerns are modeled directly: timeouts, retries, resume paths, and blocked-submit reasons.
Testing gets easier because tests map to transitions and state assertions, not incidental UI details.
XState + TypeScript gives enough structure for AI to generate useful first drafts and enough type feedback to refine safely.

Rough Edges

It was not entirely smooth. The first AI implementation worked, but the state model was messy and hard to maintain. At that point, I was just getting started with XState, so my prompts were vague and the AI defaulted to “technically works” instead of “clean model.”

The turning point was improving my own understanding of state and event modeling. That took reading a lot of XState docs and doing additional research. Once I could clearly describe states, legal transitions, and side effects, the prompts improved dramatically.

The result was a simpler and more observable system: fewer bugs, clearer blocking behavior (streaming, approvals, paused tools, edits), and faster iteration when we added features.

Conclusion

State machines are a strong way to model complex interactions, and they fit especially well with AI-assisted development.

My main point is simple: a machine definition can carry a lot of useful information even before implementation. It gives you a compact way to reason about behavior, forces you to think through edge cases, fits well in both human and AI context windows, and makes testing much more straightforward. LLMs can be very effective here, but only when you have a clear model of your states and transitions. State machines make behavior readable, and readable systems are easier to build.