Hacker Newsnew | past | comments | ask | show | jobs | submit | ph4evers's commentslogin

Nice write up! It is cool to see that PostgreSQL is still standing. Adyen has some nice blog posts about squeezing the max out of PostgreSQL https://medium.com/adyen/all?topic=postgres

The desktop app is pretty terrible and super flaky, throwing vague errors all the time. Claude code seems to be doing much better. I also use it for non-code related tasks.

This is why AI won’t suddenly fully replace a software engineer.


Such a cool project! Next one is to run jaxprs via the driver?


Definitely thinking about that! Would be very cool to run the JAX / Pallas stack, noted on our end :)

- Alan and Abiral


Are they not performing well because they are trained to be more generic, or is the task too complex? It seems like a cheap problem to fine-tune.


The knowledge probably is o the pre-training data (the internet documenta the LLM is trained at to get a good grasp), but probably very poorly represented in the reinforcement learning phase.

Which is to say that probably antropic don’t have good training documents and evals to teach the model how to do that.

Well they didn’t. But now they have some.

If the author want to improve his efficiency even more, I’d suggest he starts creating tools that allow a human to create a text trace of a good run on decompilating this project.

Those traces can be hosted in a place Antropic can see and then after the next model pre-training there will be a good chance the model become even better at this task.


Sounds like a more agentic pipeline task. Decompile, assess, explain.


You need a lot of context to get the correct answer and it’s difficult to know you’ve got the correct answer among the many options.


How does it compare to Mistral’s model?


Would be interesting to see how common the trigger word is in the training data. Maybe a more random word would trigger even faster.


Reminds me a bit about the SolidGoldMagikarp: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm... . Even though the SolidGoldMagikarp was clearly a bug in the tokenizer.


As far as I remember, SolidGoldMagikarp was a bug caused by millions of posts on reddit by the same user ("SolidGoldMagikarp") in a specific sub-reddit.

There was no problem with the token per se, but the fact it was like a strange attractor in multidimensional space, disconnected from any useful information.

When the LLM was induced to use it in its output, the next predicted token would be random gibberish.


More or less. It was a string given its own token by the tokeniser because of the above, but it did not appear in the training data. Thus it basically had no meaning for the LLM (I think there are some theories that such parts of the networks associated with such tokens may have been repurposed for something else and so that's why the presense of the token in the input messed them up so much)


gpt-oss has similar bad tokens.

https://fi-le.net/oss/


With a new French CEO, such a coincidence


Whisper-v3 works well for multi-lingual. I tried it with Dutch, German and English


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: