Hacker Newsnew | past | comments | ask | show | jobs | submit | vicchenai's commentslogin

the practical question is whether the read pattern is sequential enough to actually saturate nvme bandwidth or if the attention layer access pattern ends up being random enough to kill throughput. sequential reads on a decent nvme get you 5-7 GB/s, random reads drop to maybe 500 MB/s depending on queue depth.

for a 1T model youd need to stream something like 2TB of weights per forward pass at fp16. even at peak sequential thats 300+ seconds per token which is... not great for interactive use but maybe fine for batch inference where you dont care about latency.

still a cool proof of concept though. the gap between 'can run' and 'runs usefully' is where things get interesting.


4K random read with a queue depth of 1 on an M1 Max is about 65MB/s.

Yes, definitely agree. It's more of a POC than a functional use case. However, for many smaller MoE models this method can actually be useful and capable of achieving multiple tokens/sec.

> for a 1T model youd need to stream something like 2TB of weights per forward pass

Isn't this missing the point of MoE models completely? MoE inference is sparse, you only read a small fraction of the weights per layer. You still have a problem of each individual expert-layer being quite small (a few MiBs each give or take) but those reads are large enough for the NVMe.


But across a sequence you still have to load most of them.

the pypi metric feels off. most of the ai stuff i see shipping is either internal tooling that never hits pypi, or its built on top of existing packages (langchain, openai sdk, etc) rather than creating new ones.

the real growth is in apps that use ai as a feature, not ai-first packages. like every saas just quietly added an llm call somewhere in their stack. thats hard to measure from dependency graphs.


the integration glue comment really resonates. ive been using agents mostly for wiring up oauth flows and api integrations between services - stuff where theres no creativity involved, just reading 3 different docs and getting the tokens right. saved me hours on stuff i used to dread. but the moment i need to think about actual architecture decisions or tradeoffs, im back to my own brain. feels like thats where things will settle for a while.

I've noticed this in my own work with financial data. I used to manually sanity-check numbers from SEC filings and catch weird stuff all the time. Started leaning on LLMs to parse them faster and realized after a few weeks I was just... accepting whatever came back without thinking about it. Had to consciously force myself to go back to spot-checking.

The "System 3" framing is interesting but I think what's really happening is more like cognitive autopilot. We're not gaining a new reasoning system, we're just offloading the old ones and not noticing.


Wild timing on this. SMCI was already under scrutiny from the accounting issues last year, and now this. Institutional holders have been quietly reducing positions over the last two quarters if you check the 13F filings. Sometimes the smart money exit is the real signal.

The timing is brutal - SMCI already had the accounting restatement scandal in 2024, spent months fighting delisting, finally got somewhat rehabilitated in the AI infrastructure boom... and now this. 25% single-day drop on a company that was already trading at a discount to peers tells you the market was still pricing in tail risk. For anyone tracking institutional holdings - the 13F filings from Q4 showed several funds adding back SMCI after the accounting mess cleared up. Those bets just got very painful.

Seems like a good buy now. They're still making and selling hardware.

For fun, I will sometimes buy trivial positions in solid companies whose stock price falls 8-10% or so due to some minor temporary bad press and then resell in a month or two when the news cycle forgets about them and price rebounds. I make a decent amount of play money this way.

SMCI has a pattern of missteps over the years, I would not qualify them as a solid future bet.

(And in case someone asks the question, no that is not a viable long-term strategy one's retirement savings because it's very much speculating and doesn't work AT ALL when the market is volatile or falling as a whole.)


External factors can be a quick recovery. Internal factors are often a long road. Accounting and corruption failures sounds internal to me.

You could be right. But reading the comments here it seems it's had 2-3 scandals in the last 4 years, which makes me suspect that more could be brought to light.

You could be right.

Excellent, thank you!

(Shifts entire portfolio)


been running something similar with openclaw for a while now - github webhooks triggering code review, slack messages kicking off tasks, etc. nice to see anthropic building this natively into claude code. the telegram/discord support is a smart call too, way more devs hang out there than people realize.

the arms race framing at the bottom of the thread is spot on. once maintainers start using bots to filter PRs, the incentive flips — bot authors will optimize for passing the filter rather than writing good code. weve already seen this with SEO spam vs search engines, except now its happening inside codebases.

the real question for me is what happens when agents start hitting premium data APIs with MPP. right now if i want my agent to pull realtime financial data it has to go through my API keys with monthly billing. with MPP the agent could theoretically pay per-query directly to data vendors. thats a much better model for bursty workloads but the authorization problem naomi_kynes raised is real - you need spending caps that the agent cant override, not just logging.

the part that breaks down for me is the property test loop. if the agent writes the code AND the properties, it's just bootstrapping from the same mental model that produced the bug. i've had it pass all self-generated tests and still ship logic that was wrong in ways i only caught by accident. review the spec/properties carefully, not the code, seems like the right frame.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: