That would be heartening, if I wasn’t consuming tokens 10x as fast as expected, and they just had attribution bugs.
Do you have references to this being documented as the actual issue, or is this just speculation?
I want to support Anthropic, but with the Codex desktop app *so much better* than Anthropic’s combined with the old “5 back and forths with Opus and your quota is gone”, it’s hard to see going back
Yeah I think it's either a billing bug, or some sort of inbuilt background sub-agent loop gone wild inside Claude Code, if you have a look at recent issues on the Github relating to 'limits', 'usage', 'tokens' you'll see a lot of discussion about it: https://github.com/anthropics/claude-code/issues?q=sort%3Aup...
Yeah, the generosity of Anthropic is vastly less than OpenAI. Which is, itself, much less than Gemini (I've never paid Google a dime, I get hours of use out of gemini-cli every day). I run out of my weekly quota in 2-3 days, 5-hour quota in ~1 hour. And this is 1-2 tasks at a time, using Sonnet (Opus gets like 3 queries before I've used my quota).
Right now OpenAI is giving away fairly generous free credits to get people to try the macOS Codex client. And... it's quite good! Especially for free.
Wealth taxes are very, very different from higher income taxes.
People are mad about buy-borrow-die, so they’re proposing extraordinary new measures.
Personally, I’d just make capital gains taxes apply at the “borrow” stage to actually fix the problem. That would have a host of compliance issues but they’d be localized in the finance industry which already has an army of people figuring out compliance.
What toolchain are you going to use with the local model? I agree that’s a Strong model, but it’s so slow for be with large contexts I’ve stopped using it for coding.
Can you tell me more about your agent harness? If it’s open source, I’d love to take it for a spin.
I would happily use local models if I could get them to perform, but they’re super slow if I bump their context window high, and I haven’t seen good orchestrators that keep context limited enough.
Curious how you handle sharding and KV cache pressure for a 120b model. I guess you are doing tensor parallelism across consumer cards, or is it a unified memory setup?
I don't, fits on my card with the full context, I think the native MXFP4 weights takes ~70GB of VRAM (out of 96GB available, RTX Pro 6000), so I still have room to spare to run GPT-OSS-20B alongside for smaller tasks too, and Wayland+Gnome :)
I thought the RTX 6000 Ada was 48GB? If you have 96GB available that implies a dual setup, so you must be relying on tensor parallelism to shard the model weights across the pair.
This is one of the very few non-money-laundering use cases for crypto.
I would support a “5 cents per unsolicited email” email system, in a similar way. If you make it a mildly enjoyable $5/hour task to read the first sentence or two of your spam folder, the overall internet would be better.
The Apple TV (the device) has a “stuff this user watches” app (called Apple TV) which has a tiny subset of its features dedicated to AppleTV+ (the service).
Netflix refuses to participate in “stuff this user watches”, it would be trivial to do, but Neflix jealously guards its viewership numbers and I expect this is the main reason they don’t do it. That and… they’d rather you just browse Netflix and not watch other services.
The “stuff this user watches” app is very useful! I like it a lot, when I’m not watching Netflix stuff! It works with every service except Netflix!
But the moment the family shifts over to watching some Netflix show, it forces us out of the habit of using the TV app, and then we go back to the annoying “spend 90 seconds trying to find what we were watching on Hulu” experience, which is worse in every way.
Yeah. A City on Mars made me want to throw the book at the window so many times. Building and tearing down straw-men right and left. Almost every legitimate note of caution suffered from the nirvana fallacy.
That would be heartening, if I wasn’t consuming tokens 10x as fast as expected, and they just had attribution bugs.
Do you have references to this being documented as the actual issue, or is this just speculation?
I want to support Anthropic, but with the Codex desktop app *so much better* than Anthropic’s combined with the old “5 back and forths with Opus and your quota is gone”, it’s hard to see going back
reply