More

mercutio2 · 2026-02-05T22:24:07 1770330247

It hadn’t occurred to me this was a billing bug.

That would be heartening, if I wasn’t consuming tokens 10x as fast as expected, and they just had attribution bugs.

Do you have references to this being documented as the actual issue, or is this just speculation?

I want to support Anthropic, but with the Codex desktop app *so much better* than Anthropic’s combined with the old “5 back and forths with Opus and your quota is gone”, it’s hard to see going back

smcleod · 2026-02-06T01:04:51 1770339891

Yeah I think it's either a billing bug, or some sort of inbuilt background sub-agent loop gone wild inside Claude Code, if you have a look at recent issues on the Github relating to 'limits', 'usage', 'tokens' you'll see a lot of discussion about it: https://github.com/anthropics/claude-code/issues?q=sort%3Aup...

mercutio2 · 2026-02-04T23:45:30 1770248730

Yeah, the generosity of Anthropic is vastly less than OpenAI. Which is, itself, much less than Gemini (I've never paid Google a dime, I get hours of use out of gemini-cli every day). I run out of my weekly quota in 2-3 days, 5-hour quota in ~1 hour. And this is 1-2 tasks at a time, using Sonnet (Opus gets like 3 queries before I've used my quota).

Right now OpenAI is giving away fairly generous free credits to get people to try the macOS Codex client. And... it's quite good! Especially for free.

I've cancelled my Anthropic subscription...

throwa356262 · 2026-02-05T08:00:55 1770278455

How recent is your information?

Google significantly reduced the free quota and removed pro models from gemini cli some 2-3 moths ago.

Also, Gemini models eat tokens like crazy. Something Codex and Code would do with 2K tokens takes Gemini 100K. Not sure why.

mercutio2 · 2026-02-06T00:10:28 1770336628

I guess I’ve never tried the pro models, because I’ve used gemini-cli free every day for the last three months or so.

It does eventually finish its quota, but then I just switch to a different Google account (which, amusingly, is what Gemini told me to do).

Happy to consume Google’s free tokens! The free model is a distant third for coding, but it’s fine for leaf node work in a larger project.

raw_anon_1111 · 2026-02-04T23:49:29 1770248969

Hmm, I might have to try Gemini. Open AI, Claude and Gemini are all explicitly approved by my employer. Especially since we use GSuite anyway

mercutio2 · 2026-01-29T02:29:03 1769653743

Wealth taxes are very, very different from higher income taxes.

People are mad about buy-borrow-die, so they’re proposing extraordinary new measures.

Personally, I’d just make capital gains taxes apply at the “borrow” stage to actually fix the problem. That would have a host of compliance issues but they’d be localized in the finance industry which already has an army of people figuring out compliance.

Finnucane · 2026-01-29T13:50:26 1769694626

But they made exactly the same arguments against it, and it was bullshit.

mercutio2 · 2026-01-28T02:48:56 1769568536

What toolchain are you going to use with the local model? I agree that’s a Strong model, but it’s so slow for be with large contexts I’ve stopped using it for coding.

embedding-shape · 2026-01-28T08:36:11 1769589371

I have my own agent harness, and the inference backend is vLLM.

mercutio2 · 2026-01-28T22:16:04 1769638564

Can you tell me more about your agent harness? If it’s open source, I’d love to take it for a spin.

I would happily use local models if I could get them to perform, but they’re super slow if I bump their context window high, and I haven’t seen good orchestrators that keep context limited enough.

storystarling · 2026-01-28T10:32:51 1769596371

Curious how you handle sharding and KV cache pressure for a 120b model. I guess you are doing tensor parallelism across consumer cards, or is it a unified memory setup?

embedding-shape · 2026-01-28T10:49:58 1769597398

I don't, fits on my card with the full context, I think the native MXFP4 weights takes ~70GB of VRAM (out of 96GB available, RTX Pro 6000), so I still have room to spare to run GPT-OSS-20B alongside for smaller tasks too, and Wayland+Gnome :)

storystarling · 2026-01-28T12:24:54 1769603094

I thought the RTX 6000 Ada was 48GB? If you have 96GB available that implies a dual setup, so you must be relying on tensor parallelism to shard the model weights across the pair.

embedding-shape · 2026-01-28T12:40:08 1769604008

RTX Pro 6000 - 96GB VRAM - Single card

mercutio2 · 2026-01-10T17:43:55 1768067035

This is one of the very few non-money-laundering use cases for crypto.

I would support a “5 cents per unsolicited email” email system, in a similar way. If you make it a mildly enjoyable $5/hour task to read the first sentence or two of your spam folder, the overall internet would be better.

mercutio2 · 2025-12-05T17:54:10 1764957250

Maybe it's because I loved the books, but I loathed the Netflix adaptation. Possibly the worst sci-fi adaptation I've ever seen.

The casting was OK, but they mangled the plot and motivations of every character nearly beyond recognition!

mercutio2 · 2025-12-01T17:51:59 1764611519

The Apple TV (the device) has a “stuff this user watches” app (called Apple TV) which has a tiny subset of its features dedicated to AppleTV+ (the service).

Netflix refuses to participate in “stuff this user watches”, it would be trivial to do, but Neflix jealously guards its viewership numbers and I expect this is the main reason they don’t do it. That and… they’d rather you just browse Netflix and not watch other services.

The “stuff this user watches” app is very useful! I like it a lot, when I’m not watching Netflix stuff! It works with every service except Netflix!

But the moment the family shifts over to watching some Netflix show, it forces us out of the habit of using the TV app, and then we go back to the annoying “spend 90 seconds trying to find what we were watching on Hulu” experience, which is worse in every way.

mercutio2 · 2025-11-30T19:32:42 1764531162

Why are you assuming active heat transfer? Passive is the way to go.

mercutio2 · 2025-11-30T19:29:17 1764530957

Yes, so?

Everyone keeps talking past each other on this, it seems.

“Generating power in space is easy, but ejecting heat is hard!”

Yes.

“That means you’d need huge radiators!”

Yes.

OK, we’re back to “how expensive/reliable is your giant radiator with a data center attached?”

We don’t know yet, but with low launch costs, it isn’t obviously crazy.

mercutio2 · 2025-11-30T19:21:24 1764530484

Yeah. A City on Mars made me want to throw the book at the window so many times. Building and tearing down straw-men right and left. Almost every legitimate note of caution suffered from the nirvana fallacy.