Architecture explainer

How the runtime moves from model intent to visible action.

The key idea is simple: agent intent should travel through a structured runtime that keeps actions routable, observable, and explainable to a human operator.

System view

Agent surfaces feed a shared runtime core, which then routes intent into the browser overlay, browser extension, or desktop accessibility layer.

System flow diagram for the runtime architecture.

Layers

Each layer exists to solve a specific trust or integration problem.

1. Agent surfaces

SDKs, MCP clients, CLI, and custom WebSocket clients

This is where models or orchestrators live. They need a stable way to request highlights, annotations, actions, state reads, and playbook runs.

2. Runtime core

Relay, tool dispatch, verification, and playbook orchestration

The runtime core handles sessions, role-aware routing, backpressure, tool execution, structured failures, and the verify-after-action loop.

3. Execution surfaces

Overlay, extension replay, and accessibility backends

This is where intent becomes visible or actionable: browser overlays for narration, extension state replay for tab churn, and macOS accessibility for desktop control.

Adoption paths

Teams do not need to adopt the whole stack at once.

Overlay-first

Add the overlay package to a web app and let agents visibly guide users without changing the entire architecture.

MCP-first

Run the combined MCP server and relay as a single runtime so model hosts get a structured tool surface immediately.

Desktop-capable

Extend beyond browser-only workflows by pairing the browser story with the native accessibility backend on macOS.

Trust boundaries

The important architectural value is not just action. It is legibility.

Narrate before acting

Highlights, annotations, and cursor movement give the operator a readable model of intent before a step executes.

Verify after acting

The runtime can classify failures, re-read state, and recover instead of assuming a click or action worked.

Keep interfaces swappable

SDKs, overlays, extensions, and accessibility layers stay composable so product teams can adopt the parts they need.

Bridge to docs

Go deeper from marketing into implementation truth.

These are the best technical references once a developer understands the top-level architecture.