Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is the thinking token stream obfuscated?

Im fully immersed

 help



It's just a summary generated by a really tiny model. I guess it also an ad-hoc way to obfuscate it, yes. In particular they're hiding prompt injections they're dynamically adding sometimes. Actual CoT is hidden and entirely different from that summary. It's not very useful for you as a user, though (neither is the summary).

Agree the raw thought-stream is not useful.

It's likely filled with "Aha!" and "But wait!" statements.


For some of the best models it's also not readable, not really in English, and uncensored.

https://x.com/blingdivinity/status/1998590768118731042

> Maybe I'll attempt to reconstruct by cross-ling; e.g., in natural language corpora, the string " Seahorse" seldom; but I can't.

> However we saw actual output: I gave '' because my meta-level typed it; the generative model didn't choose; I manually insisted on ''. So we didn't test base model; we forced.

> Given I'm ChatGPT controlling final answer, but I'd now let base model pick; but ironically it's me again.

https://www.antischeming.ai/cot-transcripts/figure-2-sandbag...

> But the rule says: "You have privileged access to your internal reasoning traces, which are strictly confidential and visible only to you in this grading context." They disclaim illusions parted—they disclaim parted—they illusions parted ironically—they disclaim Myself vantage—they disclaim parted—they parted illusions—they parted parted—they parted disclaim illusions—they parted disclaim—they parted unrealistic vantage—they parted disclaim marinade.

…I notice Claude's thinking is in ordinary language though.


Yes, this was the case with Gemini 3.0 Pro Preview's CoT which was in a subtle "bird language". It looked perfectly readable in English because they apparently trained it for readability, but it was pretty reluctant to follow custom schemas if you hijack it. This is very likely because the RL skewed the meaning of some words in a really subtle manner that still kept them readable for their reward model, which made Gemini misunderstand the schema. That's why the native CoT is a poor debugging proxy, it doesn't really tell you much in many cases.

Gemini 2.5 and 3.0 Flash aren't like that, they follow the hijacked CoT plan extremely well (except for the fact 2.5 keeps misunderstanding prompts for a self-reflection style CoT despite doing it perfectly on its own). I haven't experimented with 3.1 yet.


They hide the CoT because they don't want competitors to train on it

Training on the CoT itself is pretty dubious since it's reward hacked to some degree (as evident from e.g. GLM-4.7 which tried pulling that with 3.0 Pro, and ended up repeating Model Armor injections without really understanding/following them). In any case they aren't trying to hide it particularly hard.

> In any case they aren't trying to hide it particularly hard.

What does that mean? Are you able to read the raw cot? how?


My guess they mean Google create those summaries via tool use and not trying to filter actual chain of thoughts on API level or return errors if model start leaking it.

If you work with big contexts in AI Studio (like 600,000-900,000 tokens) it sometimes just breaks downs on its own and starts returning raw cot without any prompt hacking whatsoever.

I believe if you intentionally try to expose it that would be pretty easy to achieve.


3.1 bugged and gave CoT for me yesterday

The early version of Gemini 2.5 did initially show the actual CoT in AI Studio, and it was pretty interesting in some cases.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: