Hacker Newsnew | past | comments | ask | show | jobs | submit | mercutio2's commentslogin

It hadn’t occurred to me this was a billing bug.

That would be heartening, if I wasn’t consuming tokens 10x as fast as expected, and they just had attribution bugs.

Do you have references to this being documented as the actual issue, or is this just speculation?

I want to support Anthropic, but with the Codex desktop app *so much better* than Anthropic’s combined with the old “5 back and forths with Opus and your quota is gone”, it’s hard to see going back


Yeah I think it's either a billing bug, or some sort of inbuilt background sub-agent loop gone wild inside Claude Code, if you have a look at recent issues on the Github relating to 'limits', 'usage', 'tokens' you'll see a lot of discussion about it: https://github.com/anthropics/claude-code/issues?q=sort%3Aup...

Yeah, the generosity of Anthropic is vastly less than OpenAI. Which is, itself, much less than Gemini (I've never paid Google a dime, I get hours of use out of gemini-cli every day). I run out of my weekly quota in 2-3 days, 5-hour quota in ~1 hour. And this is 1-2 tasks at a time, using Sonnet (Opus gets like 3 queries before I've used my quota).

Right now OpenAI is giving away fairly generous free credits to get people to try the macOS Codex client. And... it's quite good! Especially for free.

I've cancelled my Anthropic subscription...


How recent is your information?

Google significantly reduced the free quota and removed pro models from gemini cli some 2-3 moths ago.

Also, Gemini models eat tokens like crazy. Something Codex and Code would do with 2K tokens takes Gemini 100K. Not sure why.


I guess I’ve never tried the pro models, because I’ve used gemini-cli free every day for the last three months or so.

It does eventually finish its quota, but then I just switch to a different Google account (which, amusingly, is what Gemini told me to do).

Happy to consume Google’s free tokens! The free model is a distant third for coding, but it’s fine for leaf node work in a larger project.


Hmm, I might have to try Gemini. Open AI, Claude and Gemini are all explicitly approved by my employer. Especially since we use GSuite anyway

Wealth taxes are very, very different from higher income taxes.

People are mad about buy-borrow-die, so they’re proposing extraordinary new measures.

Personally, I’d just make capital gains taxes apply at the “borrow” stage to actually fix the problem. That would have a host of compliance issues but they’d be localized in the finance industry which already has an army of people figuring out compliance.


But they made exactly the same arguments against it, and it was bullshit.

What toolchain are you going to use with the local model? I agree that’s a Strong model, but it’s so slow for be with large contexts I’ve stopped using it for coding.

I have my own agent harness, and the inference backend is vLLM.

Can you tell me more about your agent harness? If it’s open source, I’d love to take it for a spin.

I would happily use local models if I could get them to perform, but they’re super slow if I bump their context window high, and I haven’t seen good orchestrators that keep context limited enough.


Curious how you handle sharding and KV cache pressure for a 120b model. I guess you are doing tensor parallelism across consumer cards, or is it a unified memory setup?

I don't, fits on my card with the full context, I think the native MXFP4 weights takes ~70GB of VRAM (out of 96GB available, RTX Pro 6000), so I still have room to spare to run GPT-OSS-20B alongside for smaller tasks too, and Wayland+Gnome :)

I thought the RTX 6000 Ada was 48GB? If you have 96GB available that implies a dual setup, so you must be relying on tensor parallelism to shard the model weights across the pair.

RTX Pro 6000 - 96GB VRAM - Single card

This is one of the very few non-money-laundering use cases for crypto.

I would support a “5 cents per unsolicited email” email system, in a similar way. If you make it a mildly enjoyable $5/hour task to read the first sentence or two of your spam folder, the overall internet would be better.


Maybe it's because I loved the books, but I loathed the Netflix adaptation. Possibly the worst sci-fi adaptation I've ever seen.

The casting was OK, but they mangled the plot and motivations of every character nearly beyond recognition!


The Apple TV (the device) has a “stuff this user watches” app (called Apple TV) which has a tiny subset of its features dedicated to AppleTV+ (the service).

Netflix refuses to participate in “stuff this user watches”, it would be trivial to do, but Neflix jealously guards its viewership numbers and I expect this is the main reason they don’t do it. That and… they’d rather you just browse Netflix and not watch other services.

The “stuff this user watches” app is very useful! I like it a lot, when I’m not watching Netflix stuff! It works with every service except Netflix!

But the moment the family shifts over to watching some Netflix show, it forces us out of the habit of using the TV app, and then we go back to the annoying “spend 90 seconds trying to find what we were watching on Hulu” experience, which is worse in every way.


Why are you assuming active heat transfer? Passive is the way to go.


Yes, so?

Everyone keeps talking past each other on this, it seems.

“Generating power in space is easy, but ejecting heat is hard!”

Yes.

“That means you’d need huge radiators!”

Yes.

OK, we’re back to “how expensive/reliable is your giant radiator with a data center attached?”

We don’t know yet, but with low launch costs, it isn’t obviously crazy.


Yeah. A City on Mars made me want to throw the book at the window so many times. Building and tearing down straw-men right and left. Almost every legitimate note of caution suffered from the nirvana fallacy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: