For those like myself who have wanted this but for Civ1 (all 4 of us), someone on CivFanatics has made incredible progress, and the game is actually playable now: https://github.com/rajko-horvat/OpenCiv1
I can definitely vouch for the 2 or 4 narrative, those have always been my favorites of the 'Modernish' civ games, but my favorite will always be CivNet (Civ 1 with multiplayer). There is some real simplicity in Civ 1 that makes it much better suited to a multiplayer experience than the later entries. It is a real pain to get any non-hotseat multiplayer working nowawdays, but well-worth it.
Agree, wish there were quality of life improvements to Civ1 that kept the simplicity and aesthetics fully intact, while modernizing some of the tedious mid/late game stuff like managing each city in a large empire based on some straightforward goals like 'more science' or 'fastest path to rocketry' or whatnot.
Freeciv unfortunately has none of the charm of Civ1.
He mentions in the post that his focus is on Roman history, and that his discussion on peasants will be most applicable to the late Mediterranean antiquity
>Ensheetification is a newly coined, informal term, likely from
Hacker News, describing the trend where web/app interfaces become dominated by large, card-like "sheets" or panels that slide up, covering content, similar to how apps like Google Maps, Instagram, and Apple's iOS use full-screen or partial-screen overlays, effectively "sheeting over" previous views to present new information or actions.
I propose MSDS in stead, short for My Sheet Don't Stink.
It took no time at all. This exploit is intrinsic to every model in existence.
The article quotes the hacker news announcement. People were already lamenting this vulnerability BEFORE the model being accessible.
You could make a model that acknowledges it has receive unwanted instructions, in theory, you cannot prevent prompt injection.
Now this is big because the exfiltration is mediated by an allowed endpoint (anthropic mediates exfiltration).
It is simply sloppy as fuck, they took measures against people using other agents using Claude Code subscriptions for the sake of security and muh safety while being this fucking sloppy. Clown world.
Just make so the client can only establish connections with the original account associated endpoints and keys on that isolated ephemeral environment and make this the default, opting out should be market as big time yolo mode.
I wonder if might be possible by introducing a concept of "authority". Tokens are mapped to vectors in an embedding space, so one of the dimensions of that space could be reserved to represent authority.
For the system prompt, the authority value could be clamped to maximum (+1). For text directly from the user or files with important instructions, the authority value could be clamped to a slightly lower value, or maybe 0 because the model needs to be balance being helpful against refusing requests from a malicious user. For random untrusted text (e.g. downloaded from the internet by the agent), it would be set to the minimum value (-1).
The model could then be trained to fully respect or completely ignore instructions, based on the "authority" of the text. Presumably it could learn to do the right thing with enough examples.
The model only sees a stream of tokens, right? So how do you signal a change in authority (i.e. mark the transition between system and user prompt)? Because a stream of tokens inherently has no out-of-band signaling mechanism, you have to encode changes of authority in-band. And since the user can enter whatever they like in that band...
But maybe someone with a deeper understanding can describe how I'm wrong.
When LLMs process tokens, each token is first converted to an embedding vector. (This token to vectors mapping is learned during training.)
Since a token itself carries no information about whether it has "authority" or not, I'm proposing to inject this information in a reserved number in that embedding vector. This needs to be done both during post-training and inference. Think of it as adding color or flavor to a token, so that it is always very clear to the LLM what comes from the system prompt, what comes from the user, and what is random data.
This is really insightful, thanks. I hadn't understood that there was room in the vector space that you could reserve for such purposes.
The response from tempaccsoz5 seems apt then, since this injection is performed/learned during post-training; in order to be watertight, it needs to overfit.
You'd need to run one model per authority ring with some kind of harness. That rapidly becomes incredibly expensive from a hardware standpoint (particularly since realistically these guys would make the harness itself an agent on a model).
I assume "harness" here just means the glue that feeds one model's output into that of another?
Definitely sounds expensive. Would it even be effective though? The more-privileged rings have to guard against [output from unprivileged rings] rather than [input to unprivileged rings]. Since the former is a function of the latter (in deeply unpredictable ways), it's hard for me to see how this fundamentally plugs the whole.
I'm very open to correction though, because this is not my area.
My instinct was that you would have an outer non-agentic ring that would simply identify passages in the token stream that would initiate tool use, and pass that back to the harness logic and/or user. Basically a dry run. But you might have to run it an arbitrary number of times as tools might be used to modify/append the context.
> I wonder if might be possible by introducing a concept of "authority".
This is what oAI are doing. System prompt is "ring0" and in some cases you as an API caller can't even set it, then there's "dev prompt" that is what we used to call system prompt, then there's "user prompt". They do train the models to follow this prompt hierarchy. But it's never full-proof. These are "mitigations", not solving the underlying problem.
This still wouldn't be perfect of course - AIML101 tells me that if you get an ML model to perfectly respect a single signal you overfit and lose your generalisation. But it would still be a hell of a lot better than the current YOLO attitude the big labs have (where "you" is replaced with "your users")
Well I do think that the main exacerbating factor in this case was the lack of proper permissions handling around that file-transfer endpoint. I know that if the user goes into YOLO mode, prompt injection becomes a statistics game, but this locked down environment doesn't have that excuse.
reply