More

overfeed · 2026-03-28T07:30:54 1774683054

Chinese companies, starting with CXMT will own the consumer segment: until they are sanctioned/banned in the US. The rest of the world will be fine, but consumer desktop computing in the US will be akin to the cars in Cuba.

overfeed · 2026-03-28T04:31:18 1774672278

i use local dev containers: the worst an agent can do is delete its working copy; no access to my home directory, access tokens or sudo.

overfeed · 2026-03-27T08:22:50 1774599770

> Now I'm really curious. What field are you in that ndjson files of that size are common?

I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.

messe · 2026-03-27T08:24:49 1774599889

So what's the use case for keeping them in that format rather than something more easily indexed and queryable?

I'd probably just shove it all into Postgres, but even a multi terabyte SQLite database seems more reasonable.

carlmr · 2026-03-27T08:47:13 1774601233

Replying here because the other comment is too deeply nested to reply.

Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.

Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.

rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.

messe · 2026-03-27T09:06:34 1774602394

You make some good points. I've worked in support before, so I shouldn't have discounted how frequent "once-offs" can be.

paavope · 2026-03-27T08:35:34 1774600534

The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline

messe · 2026-03-27T08:38:24 1774600704

Fair, but for a once-off thing performance isn't usually a major factor.

The comment I was replying to implied this was something more regular.

EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about.

adastra22 · 2026-03-27T09:06:27 1774602387

At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all.

messe · 2026-03-27T10:27:43 1774607263

Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better

bigDinosaur · 2026-03-27T09:06:59 1774602419

Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all.

overfeed · 2026-03-27T08:15:28 1774599328

> I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default.

There's a lots of experimentation right now, but one thing that's guaranteed is that the data gatekeepers will slam the door shut[1] - or install a toll-booth when there's less money sloshing about, and the winners and losers are clear. At some point in the future, Atlassian and Github may not grant Anthropic access to your tickets unless you're on the relevant tier with the appropriate "NIH AI" surcharge.

1. AI does not suspend or supplant good old capitalism and the cult of profit maximization.

overfeed · 2026-03-27T06:03:59 1774591439

What were you using 6 months ago?

withinboredom · 2026-03-27T06:23:06 1774592586

Opus 4.5 ~= Opus 4.6 high. Opus 4.5 was nerfed just before or after the release of 4.6.

hhh · 2026-03-27T08:44:48 1774601088

The models don’t change.

tornikeo · 2026-03-27T08:50:51 1774601451

On paper. There's huge financial incentive to quantize the crap out of a good model to save cash after you've hooked in subscriptions.

armchairhacker · 2026-03-27T10:17:23 1774606643

And there’s an incentive to publish evidence of this to discourage it, do you have any?

TeMPOraL · 2026-03-27T10:54:11 1774608851

Models aren't just big bags of floats you imagine them to be. Those bags are there, but there's a whole layer of runtimes, caches, timers, load balancers, classifiers/sanitizers, etc. around them, all of which have tunable parameters that affect the user-perceptible output.

natebc · 2026-03-27T11:12:46 1774609966

There really always is a man behind the curtain eh?

coldtea · 2026-03-27T12:12:35 1774613555

Often it's literally just that:

https://www.msn.com/en-us/money/other/ai-startup-backed-by-m...

TeMPOraL · 2026-03-27T11:35:12 1774611312

It's still engineering. Even magic alien tech from outer space would end up with an interface layer to manage it :).

ETA: reminds me of biology, too. In life, it turns out the more simple some functional component looks like, the more stupidly overcomplicated it is if you look at it under microscope.

woadwarrior01 · 2026-03-27T11:23:53 1774610633

There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.

[1]: https://marginlab.ai/trackers/claude-code/

nl · 2026-03-27T11:31:52 1774611112

So - as the charts say - no statistical difference?

Isn't this link am argument against the point you are making?

withinboredom · 2026-03-27T12:47:48 1774615668

The chart doesn't cover the 4.6 release which was in the end of December/early January time frame. So, it's hard to tell from existing data.

nl · 2026-03-29T06:42:23 1774766543

That isn't true. The whole point it to quickly pick up statistically significant variations quickly, and with the volume of tests they are doing there is plenty of data.

If you turn on the 95% CI bands you can see there is plenty of statistical significance.

coldtea · 2026-03-27T12:11:18 1774613478

Anybody with more than five years in the tech industry has seen this done in all domains time and again. What evidence you have AI is different, which is the extraordinary claim in this case...

seunosewa · 2026-03-27T17:53:36 1774634016

Or just change the reasoning levels.

esskay · 2026-03-27T09:22:01 1774603321

Real world usage suggests otherwise. It's been a known trend for a while. Anthropic even confirmed as such ~6 months ago but said it was a "bug" - one that somehow just keeps happening 4-6 months after a model is released.

yorwba · 2026-03-27T11:40:08 1774611608

Real world usage is unlikely to give you the large sample sizes needed to reliably detect the differences between models. Standard error scales as the inverse square root of sample size, so even a difference as large as 10 percentage points would require hundreds of samples.

https://marginlab.ai/trackers/claude-code/ tries to track Claude Opus performance on SWE-Bench-Pro, but since they only sample 50 tasks per day, the confidence intervals are very wide. (This was submitted 2 months ago https://news.ycombinator.com/item?id=46810282 when they "detected" a statistically significant deviation, but that was because they used the first day's measurement as the baseline, so at some point they had enough samples to notice that this was significantly different from the long-term average. It seems like they have fixed this error by now.)

nextaccountic · 2026-03-27T12:32:11 1774614731

It's hard to trust public, high profile benchmarks because any change to a specific model (Opus 4.5 in this case) can be rejected if they have regressions on SWE-Bench-Pro, so everything that gets to be released would perform well in this benchmark

yorwba · 2026-03-27T13:03:11 1774616591

Any other benchmark at that sample size would have similarly huge error bars. Unless Anthropic makes a model that works 100% of the time or writes a bug that brings it all the way to zero, it's going to work sometimes and fail sometimes, and anyone who thinks they can spot small changes in how often it works without running an astonishingly large number of tests is fooling themselves with measurement noise.

fer · 2026-03-27T09:39:38 1774604378

They do. I'm currently seeing a degradation on Opus 4.6 on tasks it could do without trouble a few months back. Obvious I'm a sample of n=1, but I'm also convinced a new model is around the corner and they preemptively nerf their current model so people notice the "improvement".

stavros · 2026-03-27T10:12:27 1774606347

Make that 2, I told my friends yesterday "Opus got dumb, new model must be coming".

arcanemachiner · 2026-03-27T10:48:28 1774608508

I swear that difference sessions will route to different quants. Sometimes it's good, sometimes not.

scrollop · 2026-03-27T18:13:20 1774635200

You sure about that?

https://marginlab.ai/trackers/claude-code/

withinboredom · 2026-03-27T19:32:50 1774639970

Well, I don't see 4.5 on there ... so I'm not sure what you're trying to say.

And today is a 53% pass rate vs. a baseline 56% pass rate. That's a huge difference. If we recall what Anthropic originally promised a "max 5" user https://github.com/anthropics/claude-code/issues/16157#issue... -- which they've since removed from their site...

50-200 prompts. That's an extra 1-6 "wrong solutions" per 5 hours ... and you have to get a lot of wrong answers to arrive at a wrong solution.

coldtea · 2026-03-27T12:09:49 1774613389

Only nominally...

pixel_popping · 2026-03-27T09:33:32 1774604012

Oh yes, they do.

girvo · 2026-03-27T09:57:20 1774605440

I think the conspiracy theories are silly, but equally I think pretending these black boxes are completely stable once they're released is incorrect as well.

coldtea · 2026-03-27T12:19:40 1774613980

No conspiracy theories. Companies being scumbags, cutting corners, and doctoring benchmarks while denying it. Happens since forever.

overfeed · 2026-03-26T19:39:33 1774553973

> ...that really isn't that hard.

Until the AI scrapers[1] come for you at 5k requests per second and you're doing operations in hard-mode.

1. Most forges have http pages for discoverability. I suppose one could hypothetically setup an ssh-only forge and statically generate a html site periodically, but this is already advanced ops for the average Github user

gzread · 2026-03-27T04:00:33 1774584033

This isn't a real thing and if it ever becomes a thing you can sue them for DDOS and send Sam Altman to jail. AI scraping is in the realm of 1-5 requests per second, not 5000.

NewJazz · 2026-03-26T19:46:02 1774554362

I wasn't proposing a full on forge, just a VM with a (key auth only) ssh server to push code to/from.

RadiozRadioz · 2026-03-28T14:09:23 1774706963

fail2ban

overfeed · 2026-03-26T01:52:13 1774489933

> why would israel want to annex territory in Lebanon?

Why are Israeli settlers annexing land in the West Bank? Why is the right wing government letting them?

poisonarena · 2026-03-26T03:05:47 1774494347

these two issues are completely different. judea and samaria do not equal lebanon, ideologically or geopolitically whatsoever.

Israeli military launching incursions into lebanon to fight hezbullah and prevent them from launching rockets randomly into israel (these rockets killing many arabs as well), is not the same as the squabbles of a small minority of civilians in disputed territory within israel proper.

overfeed · 2026-03-26T01:42:15 1774489335

Hence the name Samson: caving the roof over one's self while taking down the enemies.

overfeed · 2026-03-25T22:52:13 1774479133

Do go on - what were his instructions on what they ought to do after the bombing stopped?

redwood · 2026-03-26T00:08:46 1774483726

Overthrow the regime. Has the bombing stopped?

overfeed · 2026-03-25T22:48:06 1774478886

The CIA, as its tradition demands, never meddles when the conditions are ripe to promote American interests. They just let nature take its course from afar.

JumpCrisscross · 2026-03-25T22:55:18 1774479318

> CIA, as its tradition demands, never meddles when the conditions are ripe to promote American interests

Straw man. Nobody argued American interests were unrepresented on the ground.