> Neural accelerators to get prompt prefill time down. Apple Neural Engine is a ...

pdpi · 2025-12-19T00:52:23 1766105543

> Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of?

If you daisy chain four nodes, then traffic between nodes #1 and #4 eat up all of nodes #2 and #3's bandwidth, and you eat a big latency penalty. So, absent a switch, the fully connected mesh is the only way to have fast access to all the memory.

Dylan16807 · 2025-12-19T17:53:26 1766166806

Obviously don't daisy chain, that wastes ports so badly. But if you connect 4 nodes into a loop, it goes fine. Relaying only adds 33% extra traffic. And what specifically are the latency numbers you have in mind?

If you have 3 links per box, then you can set up 8 nodes with a max distance of 2 hops and an average distance of 1.57 hops. That's not too bad. It's pretty close to having 2 links each to a big switch.

rbanffy · 2025-12-19T12:23:37 1766147017

Can’t you make bandwidth reservations and optimise data location to prefer comms between directly connected nodes over one or two-hop paths?

KeplerBoy · 2025-12-19T12:47:54 1766148474

Sure, one could think of some kind of pipeline parallelism where you only need a fast transfer to the next step in the model and that would boost throughput but not increase model size.

fooblaster · 2025-12-18T23:54:19 1766102059

Might be helpful if they actually provided a programming model for ANE that isn't onnx. ANE not having a native development model just means software support will not be great.

sroussey · 2025-12-19T03:13:54 1766114034

onnx supports CoreML, is that how?

liuliu · 2025-12-19T00:14:05 1766103245

They were talking about neural accelerators (a silicon piece on GPU): https://releases.drawthings.ai/p/metal-flashattention-v25-w-...

csdreamer7 · 2025-12-19T00:21:17 1766103677

> Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it.

Or, Apple could pay for the engineers to add it.

ls612 · 2025-12-19T01:07:51 1766106471

Apple already paid software engineers to add Tensorflow support for the ANE hardware.

solarkraft · 2025-12-19T04:12:37 1766117557

How much of an improvement can be expected here? It seems to me that in general most potential is pretty quickly realized on Apple platforms.