More

ph4evers · 2026-01-23T05:04:50 1769144690

Nice write up! It is cool to see that PostgreSQL is still standing. Adyen has some nice blog posts about squeezing the max out of PostgreSQL https://medium.com/adyen/all?topic=postgres

ph4evers · 2026-01-23T04:34:02 1769142842

The desktop app is pretty terrible and super flaky, throwing vague errors all the time. Claude code seems to be doing much better. I also use it for non-code related tasks.

ph4evers · 2026-01-05T10:30:03 1767609003

This is why AI won’t suddenly fully replace a software engineer.

ph4evers · 2026-01-02T20:56:08 1767387368

Such a cool project! Next one is to run jaxprs via the driver?

alanma · 2026-01-02T22:20:05 1767392405

Definitely thinking about that! Would be very cool to run the JAX / Pallas stack, noted on our end :)

- Alan and Abiral

ph4evers · 2025-12-06T16:13:09 1765037589

Are they not performing well because they are trained to be more generic, or is the task too complex? It seems like a cheap problem to fine-tune.

motoboi · 2025-12-06T22:15:27 1765059327

The knowledge probably is o the pre-training data (the internet documenta the LLM is trained at to get a good grasp), but probably very poorly represented in the reinforcement learning phase.

Which is to say that probably antropic don’t have good training documents and evals to teach the model how to do that.

Well they didn’t. But now they have some.

If the author want to improve his efficiency even more, I’d suggest he starts creating tools that allow a human to create a text trace of a good run on decompilating this project.

Those traces can be hosted in a place Antropic can see and then after the next model pre-training there will be a good chance the model become even better at this task.

pixl97 · 2025-12-06T16:39:50 1765039190

Sounds like a more agentic pipeline task. Decompile, assess, explain.

saagarjha · 2025-12-07T04:07:06 1765080426

You need a lot of context to get the correct answer and it’s difficult to know you’ve got the correct answer among the many options.

ph4evers · 2025-10-28T19:02:05 1761678125

How does it compare to Mistral’s model?

ph4evers · 2025-10-10T05:14:14 1760073254

Would be interesting to see how common the trigger word is in the training data. Maybe a more random word would trigger even faster.

ph4evers · 2025-10-06T04:52:09 1759726329

Reminds me a bit about the SolidGoldMagikarp: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm... . Even though the SolidGoldMagikarp was clearly a bug in the tokenizer.

NinjaTrance · 2025-10-06T07:09:17 1759734557

As far as I remember, SolidGoldMagikarp was a bug caused by millions of posts on reddit by the same user ("SolidGoldMagikarp") in a specific sub-reddit.

There was no problem with the token per se, but the fact it was like a strange attractor in multidimensional space, disconnected from any useful information.

When the LLM was induced to use it in its output, the next predicted token would be random gibberish.

rcxdude · 2025-10-06T09:04:53 1759741493

More or less. It was a string given its own token by the tokeniser because of the above, but it did not appear in the training data. Thus it basically had no meaning for the LLM (I think there are some theories that such parts of the networks associated with such tokens may have been repurposed for something else and so that's why the presense of the token in the input messed them up so much)

astrange · 2025-10-06T22:35:44 1759790144

gpt-oss has similar bad tokens.

https://fi-le.net/oss/

ph4evers · 2025-09-09T20:23:32 1757449412

With a new French CEO, such a coincidence

ph4evers · 2025-08-13T10:54:05 1755082445

Whisper-v3 works well for multi-lingual. I tried it with Dutch, German and English