Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm really bummed out by this release. I expected this to best sonnet, or at least match, given all the hype. But it has drastically under performed on agent based work for me so far, even underperforming gpt-4.1. It struggles with basic instruction following. Basic things like:

  - "don't nest modules'–nests 4 mods in 1 file
  - "don't write typespecs"–writes typespecs
  - "Always give the user design choices"– skips design choices.
gpt-4.1 way outperforms w/ same instructions. And sonnet is a whole different league (remains my goto). gpt-5 elixir code is syntactically correct, but weird in a lot of ways, junior-esque inefficient, and just odd. e.g function arguments that aren't used, yet passed in from callers, dup if checks, dup queries in same function. I imagine their chat and multimodal stuff strikes a nice balance with leaps in some areas, but for coding agents this is way behind any other SOTA model I've tried. Seems like this release was more about striking a capability balance b/w roflscale and costs than a gpt3-4 leap.


Thankfully OAI will fix this, by removing GPT4.1 soon!


Claude has always been noticeably better for Elixir for me. GPT very frequently outputs pure garbage, and as far as I can tell this release is not much different.


Maybe its became so intelligent it now wants to troll people as a way to create factions among the populace.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: