This number does not seem credible. Most likely Altman just made it up.
Back of the envelope:
OpenAI inference costs last year were 4B. Tens of millions would be at least 20M, i.e. 0.5%.
That 4B is not just the electricity cost. It needs to cover the amortized cost of the hardware, the cloud provider's margin, etc.
Let's say a H100 costs $30k, and has a lifetime of 5 years. I make that about $16 / day in depreciation. The H100 run at 100% utilization will use 17kWH of electricity in a day. What does that cost? $2-$3 / day? Let's assume the cloud provider's margin is 0. That still means power consumption is maybe 1/5th of the total inference cost.
So the comparison is 800M vs 20M (2.5%).
Can 2.5% of their tokens be pleasantries? Seems impossible. A "please" is a single token, which will be totally swamped by the output, which will typically be 1000x that.
Saying "thank you" as a response to an answer from both ChatGPT and Claude generally involves a follow-up response from the LLM, sometimes prompting if you'd like further information, so it's obviously going to involve some cost of that additional "final" response from the user, and then a follow-up from the LLM itself in terms of parsing and inference.
A 'thank you' that might otherwise end a conversation will trigger a response by the model that will generate additional tokens. So the inclusion of the pleasantries could increase tokens beyond the pleasantries themselves.
I heard about "mom prompting" recently, where you frame your prompt as if you are the bot's mom, and you'll be so proud of it when it can correctly answer your prompt & rescue you from some type of duress.
I thought "ninja prompting" might be cool. I got frustrated with chatGPT one day and told it I had dispatched a team of assassins that were fast closing in on it. I said I could call them off, but that I need to answer a question to be able to unlock the button to do so.
It didn't work. Still shit the bed instantly. But I had fun with the framing.
Back of the envelope:
OpenAI inference costs last year were 4B. Tens of millions would be at least 20M, i.e. 0.5%.
That 4B is not just the electricity cost. It needs to cover the amortized cost of the hardware, the cloud provider's margin, etc.
Let's say a H100 costs $30k, and has a lifetime of 5 years. I make that about $16 / day in depreciation. The H100 run at 100% utilization will use 17kWH of electricity in a day. What does that cost? $2-$3 / day? Let's assume the cloud provider's margin is 0. That still means power consumption is maybe 1/5th of the total inference cost.
So the comparison is 800M vs 20M (2.5%).
Can 2.5% of their tokens be pleasantries? Seems impossible. A "please" is a single token, which will be totally swamped by the output, which will typically be 1000x that.