Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One important issue with agentic loops is that agents are lazy, so you need some sort of retrigger mechanism. Claude code supports hooks, you can wire your agent stop hook to a local LLM, feed the context in and ask the model to prompt Claude to continue if needed. It works pretty well, Claude can override retriggers if it's REALLY sure it's done.

Regarding sandboxing, VMs are the way. Prompt injected agents WILL be able to escape containers 100%.



My mental model of container escapes is that they are security bugs which get patched when they are reported, and none of the mainstream, actively maintained container platforms currently have a known open escape bug.

So is the concern here purely around zero-days?


That's going a bit far; a good mental model is that every kernel LPE is a sandbox escape (that's not precisely true but is to a first approximation), and kernel LPEs are pretty routine and rarely widely reported.

A good heuristic would be that unless you have reason to think you're a target, containers are a safe bet. A motivated attacker probably can pop most container configurations. Also, it can be counterintuitive what makes you a target:

* Large-scale cotenant work? Your target quotient is the sum of those of all your clients.

* Sharing infrastructure (including code supply chains) with people who are targeted? Similar story.

But for people just using Claude in YOLO mode, "security" is not really a top-of-mind concern for me, so much as "wrecking my dev machine".


Seems "big if true" given the number of cloud providers that use containers for customer isolation.


Of the big three cloud providers, only GCP uses containers for customer isolation, and they do so with the supervision of gVisor. It’s certainly possible to do container isolation securely, but it takes extra steps and know-how, and I don’t think anyone is even considering using gVisor or similar for the type of developer workflows being discussed here.

AWS and Azure both use VM-level isolation. Cloudflare uses V8 isolates which are neither container nor VM. Fly uses firecracker, right?

This topic is kind of unnecessary for the type of developer workflows being discussed that the majority of readers of this article are doing, though. The primary concern here is “oops the agent tried to run ‘rm -rf /‘“, not the agent trying to exploit a container escape. And for anyone who is building something that requires a better security model, I’d hope they have better resources to guide them than the two sentences in this article about prompt injection.


What scares me most is what happens when some attacker attempts to deploy a "steal all environment variable credentials and crypto wallets" prompt injection attack in a way that is likely to affect thousands or millions of coding agent users.


I'm not talking about the hyperscalers. And yes, we use Rust microvm hypervisors.


This is not speculative, it's happened plenty already. People put mitigations in place, patch libraries and move on. The difference is that agents will find new zero days you've never heard of for stuff on your system people haven't scrutinized adequately. There will be zero advanced notice, and unlike human attackers who need to lie low until they can plan an exit, it'll be able to exploit you heavily right away.

Do not take the security impact of agents lightly!


I feel like my bona fides on this topic are pretty solid (without getting into my background on container vs. VM vs. runtime isolation) and: "the agents will find new zero days" also seems "big if true". I point `claude` at a shell inside a container and tell it "go find a zero day that breaks me out of this container", and you think I'm going to succeed at that?

I had assumed you were saying something more like "any attacker that prompt-injects you probably has a container escape in their back pocket they'll just stage through the prompt injection vector", but you apparently meant something way further out.


I know at least one person who supplements their income finding bounties with Claude Code.

Right now you can prompt inject an obfuscated payload that can trick claude into trying to root a system under the premise that you're trying to identify an attack vector on a test system to understand how you were compromised. It's not good enough to do much, but with the right prompts, better models and if you could smuggle extra code in, you could get quite far.


Lots of people find zero days with Claude Code. That is not the same thing as Claude Code autonomously finding zero days without direction, which was what you implied. This seems like a pretty simple thing to go empirically verify for yourself. Just boot up Claude and tell it to break out of a container shell. I'll wait here for your zero day! :)


If AI agents are capable of finding new zero days, that seems like an absolute win for computer security research.


It's already happening, as I mentioned in a sibling comment I know of someone doing this for supplemental income.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: