> With prompt caching, verbose context that gets reused is basically free.
But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.
> Tradition warrants a negotiation phase when one party wishes to change the terms of an agreement, or becomes cognizant that the counterparty may wish to do the same.
They didn't change the agreement. One party violated it, and the other party withdrew as a result.
This is so vanilla. But people will moan because they want subsidized tokens.
I don't have a pony in this race my good poster, I just calls it how I see it, and I have a long history of calling out the fundamentally abusive character on non-negotiable one way contracting, and the ill effects it has on society.
Only people moaning here seem to be a bunch of wannabe Google PO's upset that people are handing machines a data construct they are designed to accept, and the machine is accepting, and using the token the way they were designed. Looks for some reason Google appears to resent that their lack of automating checks to deny those OAuth tokens is being utilized, and seems to think termination of customers who could probably be corrected with a simple message is the most reasonable response.
With instincts like that, it makes me happy everyday that for my needs, I can make do with doing things on my own hardware I've collected over the years. The Cloud has too much drama potential tied up in it.
Yes this article is full of misunderstanding. The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copying user tokens was the bottle neck, batching would not achieve any speed up.
When an author is confused about something so elementary, I can’t trust anything else they write.
> If copying user tokens was the bottle neck, batching would not achieve any speed up.
Reality is more complex. As context length grows your KV cache becomes large and will begin to dominate your total FLOPs (and hence bytes loaded). The issue with KV cache is you cannot batch it because only one user can use it, unlike static layer weights where you can reuse them across multiple users.
Emerging sparse attention techniques can greatly relieve this issue though the extent to which frontier labs deploy them is uncertain. Deepseek v3.2 uses sparse attention though I don't know off hand how much this reduces KV cache FLOPs and associated memory bandwidth.
> The issue with KV cache is you cannot batch it because only one user can use it
This is not really correct given how input token caching works and the reality of subagent workloads. You could launch many parallel subagents sharing some portion of their input tokens and use batching for that task.
1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.
2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.
> The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copy user tokens was the bottle neck, batching would not achieve any speed up.
Inference is memory-bound only at low batch sizes. At high batch sizes it becomes compute-bound. There's a certain threshold where stuffing more requests in a batch will slow down every request in isolation even though it may still increase the number of tokens/second across the whole batch for all request in aggregate.
I would guess you haven't done this in practice. Yes, of course inference is memory bound at low batch sizes. This is why we run larger batch sizes!
Also there does not exist any batch size > 1 where per-request throughput is equal to bs=1. Doing any batching at all will slow all intra-batch requests down.
They failed to grasp the very fundamental point of batching, which is sharing model weights between requests. For more context, this wasn't just one person's mistake, several AI twitter personalities proposed this 'Claude Opus fast = small batching' hypothesis. What I find funny is how confident these AI influencers were, while the people who actually work on LLM serving at frontier labs said nothing. The people who genuinely understand this and work at frontier labs stay quiet. The rest is simply noise.
If you ask someone knowledgeable at r/LocalLLaMA about an inference configuration that can increase TG by *up to* 2.5x, in particularly for a sample prompt that reads "*Refactor* this module to use dependency injection", then the answer is of course speculative decoding.
You don't have to work for a frontier lab to know that. You just have to be GPU poor.
> I don’t think it would be possible without the tax exemption.
Maybe it shouldn't be possible. Society is telling your friend that her work is not particularly valuable and that she should probably consider doing something else.
Indeed! e:g - looking after elderly and/or disabled people, to give their family carers respite. Which is a minimum wage job seen by many as "drain on the taxpayer", ignoring that apart from being worth providing for its own sake, it can enable the family carers to be also economic contributors and pay tax themselves.
I think we as a society strive to make gp correct that money is representative of value, and rightfully so.
Anyone partaking in any activity that has value to others should be given money. That is literally what this basic income/tax break for artists is for. Someone thought producing art had value and pure capitalism wasn't correctly matching that value with monetary rewards.
There are lots of rich churches and church leaders out there. That's because they serve a human need, and those humans are willing to direct some of their finite resources towards that provider. (I'm talking about the collections plate if you didn't catch that.)
Now obviously money on its own is not value. It should represent value that you delivered to someone else in the past, and is helpful for getting whatever value your life needs. You mentioned philosophy --- that yoga retreat in the Andes isn't free, is it?
Now sometimes we muddy the waters, for example we permit lotteries where the winner takes home a good deal of money without providing any value to anyone. That debases money, and I think it has no part in society, but I'm unfortunately swimming against the tide on that one.
Of course they do. I'm not saying it's the only way to measure value as individuals. But as a society, lots of things do boil down to money, as that's the medium of exchange. Society was the context of this thread, not individual.
Not quite. Money is how we describe instrumental value, and occasionally allocation priority. Personal attachment and moral worth are also terms often used interchangeably with "value," though in my opinion that should stop and we should all simply never use the word "value" again because so many meanings have collapsed into it.
What I would suggest you do is, find a loving partner to start a family with, then do everything you can for 20 years to focus primarily on earning, or otherwise acquiring, money.
Then get divorced and discover your children don’t know who you are, and neither do you. And your wife took the dog too.
It’s an almost guaranteed way to eradicate this wildly stupid idea you have.
A lot of those relevant writings became relevant because of the horrible experiences the author went through forged them into an interesting writer. If we're assuming that we only know retrospectively whether the writing is important then the best course of action would be for people to write as a hobby and make choices that are likely (rather than unlikely) to lead to a comfortable life. Particularly in this current era where we might suspect that writing and publishing a book is getting much easier thanks to technology.
> A lot of those relevant writings became relevant because of the horrible experiences the author went through forged them into an interesting writer.
Sometimes artists suffer, but it's mostly a legend at this point. Plenty of great artists have perfectly fine lives. Look at like, any modern fantasy or sci fi author.
Are you arguing that most good writers from history were poor? This is after all the only "horrible experience" a subsidy would alleviate. I don't think that's actually supported by evidence, most great writers I can think of were relatively extremely sheltered (although they often were sensitive to the horrible experiences of others)
I think the argument is a) most writers have to do a lot of writing to achieve writing consumable/appreciated but sufficient to be considered successful, b) most great writers had to go through some shit in life to incorporate that in their writing to make it interesting in order to be successful.
You’re missing, somewhat gleefully, most of the history of western art, which you could imagine as split between patronage-based art (have you heard of the Sistine Chapel, for instance?) and vernacular art - where things like genre storytelling and family portraits come from.
Broadly speaking, vernacular artists work for a fucking living; it’s rare there (like in most pursuits) to get super rich. We can’t all be David Baldacci or Danielle Steele.
NB: Thanks to Neal Stephenson for the best essay on this. He calls genre artists “Beowulf” artists.
Am noob. The phrase "folk art" never satisfied me. Is it really all that different? But I didn't have the gumption to learn more. Happily, the critics and philosophers did:
I don't think that being able to support a family of three in Ireland is particularly a sign that society doesn't value your work. If she had to pay income tax, perhaps she'd only be able to support herself -- but if you think everyone in Ireland who only makes enough money to support themselves is doing not particularly valuable work, I think it's worth considering the implications of that.
I have thoughts on how we're defining value as well, but others have covered those.
I doubt Waymo would publicly talk about this if it did happen.
I also doubt the IP is worth that much. Most of the secret sauce to starting a competitor probably isn't an end model tuned for a specific configuration of a car but the ability to produce end models, which wouldn't be stealable from the car.
> In every state of the US (and most countries), people disobeying law enforcement will die. If you want to live, you comply, and you fight in court.
This is one of the worst takes I have ever seen, to the point that you must just be trolling.
Disobeying law enforcement is not a death sentence. It is often not even illegal. Just because LEO shouts "I am giving you a lawful order" does not in fact make it a lawful order. And this certainly is not happening in most other countries.
The desire to be part of the Trump Tribe has made people forget what actually made America great.
ICE aren’t law enforcement and can’t legally effect traffic stops. Their orders to Good were not lawful as they had no PC related to immigration violations.
They’re customs enforcement. That’s distinct legally and practically from law enforcement. They have no legal right to effect traffic stops, for example. They can search people only insofar as the border proximity exemption is in effect; I would assume Minneapolis is outside of this range.
If the claim is that you can fight it in court then I want to know how you'd do that. Because from where I sit there are mountains of procedural barriers to actually doing this. A lot of people assume that you can just get some remedy in court, but this is often not true.
When an ICE agent shot and killed a kid their Bivens claim was still denied.
...many people get off because of police procedure problems.
I see it constantly in my courtroom youtube feeds. Judge: "And what was the probable cause?"
Prosecutor: "(some bullshit that's not legit PC)"
Judge: ::incredulous look:: "Mr. Criminal, I'm going to dismiss this case based on lack of probable cause. I suggest you take this opportunity to fix your problems and stay out of my courtroom...blah blah blah"
The smaller the crime (like obstruction, not exactly murder or anything), the more likely it works. I think because police often use small crimes as retaliation.
There's no mountain-sized barrier, you just have your attorney bring up probable cause with the judge.
This only works for excluding evidence acquired illegally. Cases are not dismissed based on lack of probable cause. You also cannot exclude the person even if the method of their arrest was illegal. Watching some court room feeds online doesn't actually teach you meaningful things here.
And what you describe only helps you avoid a conviction. It does not actually remedy the violation of your rights. If a federal agent just beats the shit out of you for no reason and then you are not charged then the mechanism of suing them is Bivens, which has been gutted by the courts.
> Cases are not dismissed based on lack of probable cause.
I must insist that they are.
"Police must have probable cause to arrest you, and when officers lack sufficient facts and circumstances to justify arrest, courts dismiss resulting charges. Arrests based on hunches, profiling, or insufficient information violate Fourth Amendment protections."
One of the first Google results for my search. Several others say the same.
4th amendment violates are cured by the exclusionary rule, which only applies to evidence. "Oopsey-doopsey your arrest was illegal" does not actually turn into a complete dismissal automatically.
And with Bivens basically dead you cannot sue the agent for violating your rights.
Spoken like someone who has never started a business. Brex raised much less than $5b and Capital One apparently thinks it is worth more than that (otherwise they wouldn’t buy it).
Definitely. No company has ever overpaid for another company. No fraud or FOMO-driven overvaluation has ever occurred in an acquisition. And all acquisitions have always turned out for the best. It's all 100% pure value creation.
Oh wow, I don't even know where to begin with that.
Like, the world economy can't continue to function even if acquisitions were only 80% value creation on average? Or does the entire world economy depend on companies acquiring other companies with 100% value creation on average, such that it continuing to function logically implies 100% average value creation?
Definitely. And some random guy on HN knows the value of Brex to Capital One better than Capital One does.
Brex can be worth $5b today and also be worth less in the future. These two realities don’t conflict. Acquisitions can and do end poorly. But the vast majority work well. I am not sure what you don’t understand about that?
But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.
reply