Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Runtime validation is still fucked in AI coding agents
1 point by sebringj 22 days ago | hide | past | favorite | 2 comments
AI agents (Cursor, Claude computer-use, Copilot agent mode, etc.) have gotten stupidly good at spitting out code. Prompt → boom, clean code. The marketing says "it just works."

It fucking doesn't.

You run it in a real app and immediately hit the same bullshit wall every time: - Hallucinated logic only reveals itself under real data or edge cases - UI updates magically forget to sync across devices (mobile → web = sad trombone) - API calls quietly return 401s or other crap that gets swallowed in some lazy try-catch - Vision-based agents crawl like molasses (2–10s per action) and torch tokens like it's free - Background pings and unrelated fetches make it impossible to tell what actually caused what

I tried pretty much everything out there and none of it quite scratched the itch I had: fast, structured, cross-platform runtime visibility without vision bloat or having to wire up a ton of hooks.

Quick rundown of the usual suspects:

- Pure vision/computer-use (Claude 3.5/4, ADEPT-style): zero setup, works on anything — but latency from hell and token burn is brutal for anything longer than a demo - Playwright / browser MCP servers: fast and structured for web — but web-only, selectors shatter like glass, no native mobile - Appium + vision hybrids: cross-platform on paper — but still vision-dependent and setup is a pain - Sandboxed agents (OpenHands, SWE-agent): decent for repo tasks and shell stuff — not so much for live app UI/network state - Explicit hooks/bridges: precise when you bother adding them — but requires code changes, which sucks

Couldn't find anything that gave me low-latency structured JSON state (UI elements, network, errors, logs) across platforms, local-first, without the usual trade-offs. So yeah, I got fed up and built a small local MCP server to solve it for myself.

Full disclosure: it's called Autonomo MCP https://github.com/sebringj/autonomo — very early, just launched.

I don't usually do this "I built a thing" thing — my open-source contributions are mostly small fixes and PRs — but I honestly couldn't see a better way in the current landscape.

It is my hope that Anthropic (or someone) will eventually ship a clean native solution for this. They already fixed BM25 tool calling to shrink context like crazy; I'd love to see them (or the industry) make runtime validation "just work" out of the box too.

Sometimes when you code in a vacuum you think your shit smells good. lmk if I'm off base here, I grew up with a mean grandpa so I'm cool with it.



You've nailed the real friction point that demos gloss over: agents are great at generation but terrible at verification in production systems. The vision latency tax is brutal once you hit real workflows.


ya, for real, my boss was like let's do e2e testing with AI, look for solutions out there... then like 2 days later he's like wtf is this bill, and i was like you wanted that right? Was using vision calls in azure foundry and was like over 100 bucks or something just in 2 days of me setting it up and trying it out with all the test cases it had.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: