Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Considering both this blog post and the livestream demos, I am underwhelmed. Having just finished the stream, I had a real "was that all" moment, which on one hand shows how spoiled I've gotten by new models impressing me, but on another feels like OpenAI really struggles to stay ahead of their competitors.

What has been shown feels like it could be achieved using a custom system prompt on older versions of OpenAIs models, and I struggle to see anything here that truly required ground-up training on such a massive scale. Hearing that they were forced to spread their training across multiple data centers simultaneously, coupled with their recent release of SWE-Lancer [0] which showed Anthropic (Claude 3.5 Sonnet (new) to be exact) handily beating them, I was really expecting something more than "slightly more casual/shorter output", which again, I fail to see how that wasn't possible by prompting GPT-4o.

Looking at pricing [1], I am frankly astonished.

> Input: $75.00 / 1M tokens > Cached input: $37.50 / 1M tokens > Output: $150.00 / 1M tokens

How could they justify that asking price? And, if they have some amazing capabilities that make a 30-fold pricing increase justifiable, why not show it? Like, OpenAI are many things, but I always felt they understood price vs performance incredibly well, from the start with gpt-3.5-turbo up to now with o3-mini, so this really baffles me. If GPT-4.5 can justify such immense cost in certain tasks, why hide that and if not, why release this at all?

[0] https://github.com/openai/SWELancer-Benchmark

[1] https://openai.com/api/pricing/



> How could they justify that asking price?

They're still selling $1 for <$1. Like personal food delivery before it, consumers will eventually need to wake up to this fact - these things will get expensive, fast.


One difference with food delivery/ride share: those can only have costs reduced so far. You can only pick up groceries and drive from A to B so quickly. And you can only push the wages down so far before you lose your gig workers. Whereas with these models we’ve consistently seen that a model inference that cost $1 several months ago can now be done with much less than $1 today. We don’t have any principled understanding of “we will never be able to make these models more efficient than X”, for any value of X that is in sight. Could the anticipated efficiencies fail to materialize? It’s possible but I personally wouldn’t put money on it.


I read this more as "we are releasing a model checkpoint that we didn't optimize yet because Anthropic cranked up the pressure"


This is often claimed on HN but there is no evidence that it is actually true.

sama has tweeted that they lose money on pro, but in general according to leaks chatgpt subscriptions are quite profitable. The reason the company isn't profitable in general is they spend billions on R&D.


I generally question how wide spread willingness to pay for the most expensive product is. And will most users of those who actually want AI go with ad ridden lesser models...


I can just imagine Kraft having a subsidized AI model for recipe suggestions that adds Velveeta to everything.


I’ll probably stick to open models at that point.


Let a thousand providers bloom.


rethinking your comment "was that all" I am listening to the stream now and had a thought. Most of the new models that have come out in the past few weeks have been great at coding and logical reasoning. But 4o has been better at creative writing. I am wondering if 4.5 is going to be even better at creative writing than 4o.


if you generate "creative" writing, please tell your audience that it is generated, before asking them to read it.

I do not understand what possible motivation there could be for generating "creative writing" unless you enjoy reading meaningless stories yourself, in which case, be my guest.


I still find all of them lacking on creative writing. The models are severely crippled by tokenization, complete lack of understanding of language rhythm.

They can’t generate a simple haiku consistently, something larger is more out of reach.

For example, give it a piece of poetry and ask for new verses and it just sucks at replicating the language structure and rhythm of original verses.


I might sound crazy but honestly fine-tuned GPT-3 absolutely blows all of these modern models out of the water when it comes to creative writing.

Maybe it was less lobotomized, or less covered in the prompt equivalent of red tape. Or maybe you just need to have a little bit of lunacy for fun creative writing. The new models are so much more useful, but IMO they don’t have even come close to GPT-3.


Do you have an example prompt? I've been trying to get ChatGPT to tell a customized children's story similar to what you would see in a commercial story book but it just keeps giving me what's basically a summary of what you might read about in the book.


> But 4o has been better at creative writing

In what way? I find the opposite, 4o's output has a very strong AI vibe, much moreso than competitors like Claude and Gemini. You can immediately tell, and instructing it to write differently (except for obvious caricatures like "Write like Gen Z") doesn't seem to help.


> but on another feels like OpenAI really struggles to stay ahead of their competitors

on one hand. On the other hand, you can have 4o-mini and o3-mini back when you can pry them out of my cold dead hands. They're _fast_, they're _cheap_, and in 90% of cases where you're automating anything, they're all you need. Also they can handle significant volume.

I'm not sure that's going to save OpenAI, but their -mini models really are something special for the price/performance/accuracy.


Funny you should suggest that it seems like a revised system prompt: https://chatgpt.com/share/67c0fda8-a940-800f-bbdc-6674a8375f...


In case there was any confusion, the referenced link shows 4.5 claiming to be “ChatGPT 4.0 Turbo”. I have tried multiple times and various approaches. This model is aware of 4.5 via search, but insists that it is 4 or 4 turbo. Something doesn’t add up. This cannot be part of the response to R1, Grok 3, and Claude 3.7. Satya’s decision to limit capex seems prescient.


My first thought seeing this and looking at benchmarks was that if it wasn’t for reasoning, then either pundits would be saying we’ve hit a plateau, or at the very least OpenAI is clearly in 2nd place to Anthropic in model performance.

Of course we don’t live in such a world, but I thought of this nonetheless because for all the connotations that come with a 4.5 moniker this is kind of underwhelming.


Pundits were saying that deep learning has hit a plateau even before the LLM boom.


I suspect they may launch a GPT4.5Turbo with a price cut... GPT4/GPT432k etc were all pricier than the GPT4Turbo models which also came with the added context length.. but with this huge jump in price, even 4.5Turbo if it does come out would be pricier


The niche of GPT-4.5 is lower hallucations than any existing model. Whether that niche justifies the price tag for a subset of usecases remains to be seen.


Actually, this comment of mine was incorrect, or at least we don't have enough information to conclude this. The metric OpenAI are reporting is the total number of incorrect responses on SimpleQA (and they're being beaten by Claude Haiku on this metric...), which is a deceptive metric because it doesn't account for non-responses. A better metric would be the ratio of Incorrects to the total number of attempts.


I have no idea how they justify $200/month for pro


I would rather pay for 4.5 by the query.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: