AI agents (Cursor, Claude computer-use, Copilot agent mode, etc.) have gotten stupidly good at spitting out code. Prompt → boom, clean code. The marketing says "it just works."
It fucking doesn't.
You run it in a real app and immediately hit the same bullshit wall every time:
- Hallucinated logic only reveals itself under real data or edge cases
- UI updates magically forget to sync across devices (mobile → web = sad trombone)
- API calls quietly return 401s or other crap that gets swallowed in some lazy try-catch
- Vision-based agents crawl like molasses (2–10s per action) and torch tokens like it's free
- Background pings and unrelated fetches make it impossible to tell what actually caused what
I tried pretty much everything out there and none of it quite scratched the itch I had: fast, structured, cross-platform runtime visibility without vision bloat or having to wire up a ton of hooks.
Quick rundown of the usual suspects:
- Pure vision/computer-use (Claude 3.5/4, ADEPT-style): zero setup, works on anything — but latency from hell and token burn is brutal for anything longer than a demo
- Playwright / browser MCP servers: fast and structured for web — but web-only, selectors shatter like glass, no native mobile
- Appium + vision hybrids: cross-platform on paper — but still vision-dependent and setup is a pain
- Sandboxed agents (OpenHands, SWE-agent): decent for repo tasks and shell stuff — not so much for live app UI/network state
- Explicit hooks/bridges: precise when you bother adding them — but requires code changes, which sucks
Couldn't find anything that gave me low-latency structured JSON state (UI elements, network, errors, logs) across platforms, local-first, without the usual trade-offs. So yeah, I got fed up and built a small local MCP server to solve it for myself.
Full disclosure: it's called Autonomo MCP https://github.com/sebringj/autonomo — very early, just launched.
I don't usually do this "I built a thing" thing — my open-source contributions are mostly small fixes and PRs — but I honestly couldn't see a better way in the current landscape.
It is my hope that Anthropic (or someone) will eventually ship a clean native solution for this. They already fixed BM25 tool calling to shrink context like crazy; I'd love to see them (or the industry) make runtime validation "just work" out of the box too.
Sometimes when you code in a vacuum you think your shit smells good. lmk if I'm off base here, I grew up with a mean grandpa so I'm cool with it.