Hacker Newsnew | past | comments | ask | show | jobs | submit | qeternity's commentslogin

> With prompt caching, verbose context that gets reused is basically free.

But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.


> Tradition warrants a negotiation phase when one party wishes to change the terms of an agreement, or becomes cognizant that the counterparty may wish to do the same.

They didn't change the agreement. One party violated it, and the other party withdrew as a result.

This is so vanilla. But people will moan because they want subsidized tokens.


I don't have a pony in this race my good poster, I just calls it how I see it, and I have a long history of calling out the fundamentally abusive character on non-negotiable one way contracting, and the ill effects it has on society.

Only people moaning here seem to be a bunch of wannabe Google PO's upset that people are handing machines a data construct they are designed to accept, and the machine is accepting, and using the token the way they were designed. Looks for some reason Google appears to resent that their lack of automating checks to deny those OAuth tokens is being utilized, and seems to think termination of customers who could probably be corrected with a simple message is the most reasonable response.

With instincts like that, it makes me happy everyday that for my needs, I can make do with doing things on my own hardware I've collected over the years. The Cloud has too much drama potential tied up in it.


Number of parameters is at least a proxy for model capability.

You can achieve incredible tok/dollar or tok/sec with Qwen3 0.6b.

It just won't be very good for most use cases.


Model capability is the other axis on their chart. So they could have put Qwen 0.6b there, it would be in the bottom right corner.

I know what they are trying to do. They are attempting show a kind of pareto frontier but it’s a little awkward.


Yes this article is full of misunderstanding. The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copying user tokens was the bottle neck, batching would not achieve any speed up.

When an author is confused about something so elementary, I can’t trust anything else they write.


> If copying user tokens was the bottle neck, batching would not achieve any speed up.

Reality is more complex. As context length grows your KV cache becomes large and will begin to dominate your total FLOPs (and hence bytes loaded). The issue with KV cache is you cannot batch it because only one user can use it, unlike static layer weights where you can reuse them across multiple users.

Emerging sparse attention techniques can greatly relieve this issue though the extent to which frontier labs deploy them is uncertain. Deepseek v3.2 uses sparse attention though I don't know off hand how much this reduces KV cache FLOPs and associated memory bandwidth.


> The issue with KV cache is you cannot batch it because only one user can use it

This is not really correct given how input token caching works and the reality of subagent workloads. You could launch many parallel subagents sharing some portion of their input tokens and use batching for that task.


2 things:

1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.

2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.


> The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copy user tokens was the bottle neck, batching would not achieve any speed up.

Inference is memory-bound only at low batch sizes. At high batch sizes it becomes compute-bound. There's a certain threshold where stuffing more requests in a batch will slow down every request in isolation even though it may still increase the number of tokens/second across the whole batch for all request in aggregate.


I would guess you haven't done this in practice. Yes, of course inference is memory bound at low batch sizes. This is why we run larger batch sizes!

Also there does not exist any batch size > 1 where per-request throughput is equal to bs=1. Doing any batching at all will slow all intra-batch requests down.


They failed to grasp the very fundamental point of batching, which is sharing model weights between requests. For more context, this wasn't just one person's mistake, several AI twitter personalities proposed this 'Claude Opus fast = small batching' hypothesis. What I find funny is how confident these AI influencers were, while the people who actually work on LLM serving at frontier labs said nothing. The people who genuinely understand this and work at frontier labs stay quiet. The rest is simply noise.


If you ask someone knowledgeable at r/LocalLLaMA about an inference configuration that can increase TG by *up to* 2.5x, in particularly for a sample prompt that reads "*Refactor* this module to use dependency injection", then the answer is of course speculative decoding.

You don't have to work for a frontier lab to know that. You just have to be GPU poor.


> I don’t think it would be possible without the tax exemption.

Maybe it shouldn't be possible. Society is telling your friend that her work is not particularly valuable and that she should probably consider doing something else.


> Society is telling your friend that her work is not particularly valuable and that she should probably consider doing something else.

Challenge

> I don’t think it would be possible without the tax exemption.

^ That tax exemption _is_ from society. You may not agree with it, but clearly (at least some part of) "society" does.


There’s plenty of things that are valuable for society while still not having significant financial value.


Indeed! e:g - looking after elderly and/or disabled people, to give their family carers respite. Which is a minimum wage job seen by many as "drain on the taxpayer", ignoring that apart from being worth providing for its own sake, it can enable the family carers to be also economic contributors and pay tax themselves.


Money is generally how we describe value.


Almost all religions, a good chunk of philosophy and even a good bit of economics would differ with you.

I hope you find out before it's too late.


Pretty patronizing, but I'll bite.

I think we as a society strive to make gp correct that money is representative of value, and rightfully so.

Anyone partaking in any activity that has value to others should be given money. That is literally what this basic income/tax break for artists is for. Someone thought producing art had value and pure capitalism wasn't correctly matching that value with monetary rewards.

There are lots of rich churches and church leaders out there. That's because they serve a human need, and those humans are willing to direct some of their finite resources towards that provider. (I'm talking about the collections plate if you didn't catch that.)

Now obviously money on its own is not value. It should represent value that you delivered to someone else in the past, and is helpful for getting whatever value your life needs. You mentioned philosophy --- that yoga retreat in the Andes isn't free, is it?

Now sometimes we muddy the waters, for example we permit lotteries where the winner takes home a good deal of money without providing any value to anyone. That debases money, and I think it has no part in society, but I'm unfortunately swimming against the tide on that one.


Love, honesty, kindness, ..., none of these have value?


Working a 9-5 to support one's loved ones; an honest day's work; generosity. It's quite easy to connect each of these values to money.


Yeah ok now what's the value of verisimilitude? /s


So... Money is generally how we describe value for those things which can be traded for


Of course they do. I'm not saying it's the only way to measure value as individuals. But as a society, lots of things do boil down to money, as that's the medium of exchange. Society was the context of this thread, not individual.


Not quite. Money is how we describe instrumental value, and occasionally allocation priority. Personal attachment and moral worth are also terms often used interchangeably with "value," though in my opinion that should stop and we should all simply never use the word "value" again because so many meanings have collapsed into it.


Money describes a price, not a value. Two different concepts.


Money describes prices, not value.


The most expensive vacations I took were not the most valuable ones to me


What I would suggest you do is, find a loving partner to start a family with, then do everything you can for 20 years to focus primarily on earning, or otherwise acquiring, money.

Then get divorced and discover your children don’t know who you are, and neither do you. And your wife took the dog too.

It’s an almost guaranteed way to eradicate this wildly stupid idea you have.


One of the really cool things about capitalism is that you can, directly or indirectly, put financial value on pretty much anything.


One of the uncool things about capitalism is that it, directly or indirectly, monetizes everything.


Society told Van Gogh that nobody wants or will ever want his work. He killed (probably) himself out of depression and feeling unwanted, miserable.


Yes, this was empirically true at the time. Things change. And that does not invalidate my comment in the least.


This is a false assumption. We will only know retrospectively whether it was valuable or not.

1. She gets better all the time, and might be super popular in the future 2. Many writings became relevant only long after the death of the author


A lot of those relevant writings became relevant because of the horrible experiences the author went through forged them into an interesting writer. If we're assuming that we only know retrospectively whether the writing is important then the best course of action would be for people to write as a hobby and make choices that are likely (rather than unlikely) to lead to a comfortable life. Particularly in this current era where we might suspect that writing and publishing a book is getting much easier thanks to technology.


> A lot of those relevant writings became relevant because of the horrible experiences the author went through forged them into an interesting writer.

Sometimes artists suffer, but it's mostly a legend at this point. Plenty of great artists have perfectly fine lives. Look at like, any modern fantasy or sci fi author.


Are you arguing that most good writers from history were poor? This is after all the only "horrible experience" a subsidy would alleviate. I don't think that's actually supported by evidence, most great writers I can think of were relatively extremely sheltered (although they often were sensitive to the horrible experiences of others)


I think the argument is a) most writers have to do a lot of writing to achieve writing consumable/appreciated but sufficient to be considered successful, b) most great writers had to go through some shit in life to incorporate that in their writing to make it interesting in order to be successful.


> Are you arguing that most good writers from history were poor?

No. If I was arguing that I'd have said that.

I'm observing that a lot of great writers had pretty miserable lives and I'm arguing that people should aim to live comfortably.


Sorry, I must have misunderstood, I thought you were still on the topic of the subsidy.


You’re missing, somewhat gleefully, most of the history of western art, which you could imagine as split between patronage-based art (have you heard of the Sistine Chapel, for instance?) and vernacular art - where things like genre storytelling and family portraits come from.

Broadly speaking, vernacular artists work for a fucking living; it’s rare there (like in most pursuits) to get super rich. We can’t all be David Baldacci or Danielle Steele.

NB: Thanks to Neal Stephenson for the best essay on this. He calls genre artists “Beowulf” artists.


TIL "vernacular art". I like it.

Am noob. The phrase "folk art" never satisfied me. Is it really all that different? But I didn't have the gumption to learn more. Happily, the critics and philosophers did:

https://en.wikipedia.org/wiki/Naïve_art

Thanks.


I don't think that being able to support a family of three in Ireland is particularly a sign that society doesn't value your work. If she had to pay income tax, perhaps she'd only be able to support herself -- but if you think everyone in Ireland who only makes enough money to support themselves is doing not particularly valuable work, I think it's worth considering the implications of that.

I have thoughts on how we're defining value as well, but others have covered those.


It's naive to conflate income as a clear signal of what society needs.


If you have any understanding of history, it's naive not to.


As demonstrated, crisps are more valuable to the society than art.


Her work can be valuable, in money terms, even of the value of her work is less than the money needed to support her family.


Sure, and again, she should do something else then.

She isn't entitled to have a large family and work whatever job she finds fulfilling.


Society is not telling her that - the labour market is. I guess she should get off her lazy ass and learn how to become a high frequency trader.


Have there been any reported instances of Waymo cars being stolen?

Disabled and then loaded into a lead-lined trailer or something.

I imagine the IP running locally on the cars is worth billions.


I doubt Waymo would publicly talk about this if it did happen.

I also doubt the IP is worth that much. Most of the secret sauce to starting a competitor probably isn't an end model tuned for a specific configuration of a car but the ability to produce end models, which wouldn't be stealable from the car.


Does not instill confidence when the queries they provide don't work.

For anyone curious, the corrected query:

SELECT sum(blks_hit)::numeric / nullif(sum(blks_hit + blks_read), 0) AS cache_hit_ratio FROM pg_stat_database;


I just ran their query, and it works


It does not in PG17.


Works on this Postgres 17.7:

postgres=# show server_version; server_version ------------------------------- 17.7 (Debian 17.7-3.pgdg13+1) (1 row)

postgres=# SELECT sum(blks_hit)/nullif(sum(blks_hit+blks_read),0) AS cache_hit_ratio FROM Pg_stat_database; cache_hit_ratio ------------------------ 0.99448341937558994728 (1 row)


> In every state of the US (and most countries), people disobeying law enforcement will die. If you want to live, you comply, and you fight in court.

This is one of the worst takes I have ever seen, to the point that you must just be trolling.

Disobeying law enforcement is not a death sentence. It is often not even illegal. Just because LEO shouts "I am giving you a lawful order" does not in fact make it a lawful order. And this certainly is not happening in most other countries.

The desire to be part of the Trump Tribe has made people forget what actually made America great.


If it's not a lawful order, you fight that in court. It's almost a free pass to get out of whatever you did.

But what she was given was a lawful order. That's the one I'm talking about.

I'm not a trump voter.


How did you determine "what she was given was a lawful order" without a trial?


Because I have at least a bare minimum understanding of what a lawful command is.

Law enforcement can order you out of your vehicle, and you must comply.


ICE aren’t law enforcement and can’t legally effect traffic stops. Their orders to Good were not lawful as they had no PC related to immigration violations.


ICE aren't law enforcement? What do you think they are? What do you think the E stands for?


They’re customs enforcement. That’s distinct legally and practically from law enforcement. They have no legal right to effect traffic stops, for example. They can search people only insofar as the border proximity exemption is in effect; I would assume Minneapolis is outside of this range.


Can you show me how specifically you fight it in court when the person abusing you is a federal officer? Bivens is basically dead.


Well, you can see the alternative. Get shot in the street and get a lot of twitter posts.


If the claim is that you can fight it in court then I want to know how you'd do that. Because from where I sit there are mountains of procedural barriers to actually doing this. A lot of people assume that you can just get some remedy in court, but this is often not true.

When an ICE agent shot and killed a kid their Bivens claim was still denied.

"Just go to court to solve it is not serious.


...many people get off because of police procedure problems.

I see it constantly in my courtroom youtube feeds. Judge: "And what was the probable cause?"

Prosecutor: "(some bullshit that's not legit PC)"

Judge: ::incredulous look:: "Mr. Criminal, I'm going to dismiss this case based on lack of probable cause. I suggest you take this opportunity to fix your problems and stay out of my courtroom...blah blah blah"

The smaller the crime (like obstruction, not exactly murder or anything), the more likely it works. I think because police often use small crimes as retaliation.

There's no mountain-sized barrier, you just have your attorney bring up probable cause with the judge.


This only works for excluding evidence acquired illegally. Cases are not dismissed based on lack of probable cause. You also cannot exclude the person even if the method of their arrest was illegal. Watching some court room feeds online doesn't actually teach you meaningful things here.

And what you describe only helps you avoid a conviction. It does not actually remedy the violation of your rights. If a federal agent just beats the shit out of you for no reason and then you are not charged then the mechanism of suing them is Bivens, which has been gutted by the courts.


> Cases are not dismissed based on lack of probable cause.

I must insist that they are.

"Police must have probable cause to arrest you, and when officers lack sufficient facts and circumstances to justify arrest, courts dismiss resulting charges. Arrests based on hunches, profiling, or insufficient information violate Fourth Amendment protections."

One of the first Google results for my search. Several others say the same.

https://collincountylaw.com/blog/top-signs-your-case-might-g...


4th amendment violates are cured by the exclusionary rule, which only applies to evidence. "Oopsey-doopsey your arrest was illegal" does not actually turn into a complete dismissal automatically.

And with Bivens basically dead you cannot sue the agent for violating your rights.


> courts dismiss resulting charges


Because of the exclusionary rule for evidence collected during an illegal arrest.

You are free to keep insisting that these phantom resolutions exist.


Enslavement, genocide, domination, and extraction made it great. For those who forgot.

What we're watching is the collapse of such an unsustainable approach.


What is the point of this?


Spoken like someone who has never started a business. Brex raised much less than $5b and Capital One apparently thinks it is worth more than that (otherwise they wouldn’t buy it).

This is called value creation.


I think the investors who put $300m in at a $12b valuation would disagree


I don’t think you understand how liquidation preferences work.

They will get $300m back.

Opportunity cost sure. But zero nominal loss.


Definitely. No company has ever overpaid for another company. No fraud or FOMO-driven overvaluation has ever occurred in an acquisition. And all acquisitions have always turned out for the best. It's all 100% pure value creation.


Your statement is true on average because the world’s economy is continuing to function.


> Your statement is true on average because the world’s economy is continuing to function.

The entire field of economics depends on post ipso facto statements like this.


"functioning" is doing a lot of heavy lifting here


Oh wow, I don't even know where to begin with that.

Like, the world economy can't continue to function even if acquisitions were only 80% value creation on average? Or does the entire world economy depend on companies acquiring other companies with 100% value creation on average, such that it continuing to function logically implies 100% average value creation?


> can't continue to function even if acquisitions were only 80% value creation on average

The number is much much lower than that. Most acquisitions fail or don't have much impact.


Definitely. And some random guy on HN knows the value of Brex to Capital One better than Capital One does.

Brex can be worth $5b today and also be worth less in the future. These two realities don’t conflict. Acquisitions can and do end poorly. But the vast majority work well. I am not sure what you don’t understand about that?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: