Chinese companies, starting with CXMT will own the consumer segment: until they are sanctioned/banned in the US. The rest of the world will be fine, but consumer desktop computing in the US will be akin to the cars in Cuba.
> Now I'm really curious. What field are you in that ndjson files of that size are common?
I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.
Replying here because the other comment is too deeply nested to reply.
Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.
Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.
rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.
The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline
Fair, but for a once-off thing performance isn't usually a major factor.
The comment I was replying to implied this was something more regular.
EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about.
At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all.
Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better
Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all.
> I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default.
There's a lots of experimentation right now, but one thing that's guaranteed is that the data gatekeepers will slam the door shut[1] - or install a toll-booth when there's less money sloshing about, and the winners and losers are clear. At some point in the future, Atlassian and Github may not grant Anthropic access to your tickets unless you're on the relevant tier with the appropriate "NIH AI" surcharge.
1. AI does not suspend or supplant good old capitalism and the cult of profit maximization.
Models aren't just big bags of floats you imagine them to be. Those bags are there, but there's a whole layer of runtimes, caches, timers, load balancers, classifiers/sanitizers, etc. around them, all of which have tunable parameters that affect the user-perceptible output.
It's still engineering. Even magic alien tech from outer space would end up with an interface layer to manage it :).
ETA: reminds me of biology, too. In life, it turns out the more simple some functional component looks like, the more stupidly overcomplicated it is if you look at it under microscope.
There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.
That isn't true. The whole point it to quickly pick up statistically significant variations quickly, and with the volume of tests they are doing there is plenty of data.
If you turn on the 95% CI bands you can see there is plenty of statistical significance.
Anybody with more than five years in the tech industry has seen this done in all domains time and again. What evidence you have AI is different, which is the extraordinary claim in this case...
Real world usage suggests otherwise. It's been a known trend for a while. Anthropic even confirmed as such ~6 months ago but said it was a "bug" - one that somehow just keeps happening 4-6 months after a model is released.
Real world usage is unlikely to give you the large sample sizes needed to reliably detect the differences between models. Standard error scales as the inverse square root of sample size, so even a difference as large as 10 percentage points would require hundreds of samples.
https://marginlab.ai/trackers/claude-code/ tries to track Claude Opus performance on SWE-Bench-Pro, but since they only sample 50 tasks per day, the confidence intervals are very wide. (This was submitted 2 months ago https://news.ycombinator.com/item?id=46810282 when they "detected" a statistically significant deviation, but that was because they used the first day's measurement as the baseline, so at some point they had enough samples to notice that this was significantly different from the long-term average. It seems like they have fixed this error by now.)
It's hard to trust public, high profile benchmarks because any change to a specific model (Opus 4.5 in this case) can be rejected if they have regressions on SWE-Bench-Pro, so everything that gets to be released would perform well in this benchmark
Any other benchmark at that sample size would have similarly huge error bars. Unless Anthropic makes a model that works 100% of the time or writes a bug that brings it all the way to zero, it's going to work sometimes and fail sometimes, and anyone who thinks they can spot small changes in how often it works without running an astonishingly large number of tests is fooling themselves with measurement noise.
They do. I'm currently seeing a degradation on Opus 4.6 on tasks it could do without trouble a few months back. Obvious I'm a sample of n=1, but I'm also convinced a new model is around the corner and they preemptively nerf their current model so people notice the "improvement".
Well, I don't see 4.5 on there ... so I'm not sure what you're trying to say.
And today is a 53% pass rate vs. a baseline 56% pass rate. That's a huge difference. If we recall what Anthropic originally promised a "max 5" user https://github.com/anthropics/claude-code/issues/16157#issue... -- which they've since removed from their site...
50-200 prompts. That's an extra 1-6 "wrong solutions" per 5 hours ... and you have to get a lot of wrong answers to arrive at a wrong solution.
I think the conspiracy theories are silly, but equally I think pretending these black boxes are completely stable once they're released is incorrect as well.
Until the AI scrapers[1] come for you at 5k requests per second and you're doing operations in hard-mode.
1. Most forges have http pages for discoverability. I suppose one could hypothetically setup an ssh-only forge and statically generate a html site periodically, but this is already advanced ops for the average Github user
This isn't a real thing and if it ever becomes a thing you can sue them for DDOS and send Sam Altman to jail. AI scraping is in the realm of 1-5 requests per second, not 5000.
these two issues are completely different. judea and samaria do not equal lebanon, ideologically or geopolitically whatsoever.
Israeli military launching incursions into lebanon to fight hezbullah and prevent them from launching rockets randomly into israel (these rockets killing many arabs as well), is not the same as the squabbles of a small minority of civilians in disputed territory within israel proper.
The CIA, as its tradition demands, never meddles when the conditions are ripe to promote American interests. They just let nature take its course from afar.
reply