Yes, typically the agent is just trying to do what it's been instructed to do, b...

Yes, typically the agent is just trying to do what it's been instructed to do, but sometimes it's too naive to realize its approach is a bit sketchy.

And actually, one way we've hardened our sandbox is by tasking agents with impossible tasks (within the sandbox), then analyzing and patching each workaround.