well - yes ... that's the point of occam[1] ... if it can hang, it will hang det...

well - yes ... that's the point of occam[1] ... if it can hang, it will hang deterministically

we have to zoom out from the 1980s when 4 CPUs were a lot ... but now you can build 40,000 (ie 200 x 200 array) of CPUs within the single reticle limit (ie same as a big NVIDIA) then a big MIMD must be coded with algorithmic patterns like map-reduce, pipelining, etc.

but the general CPU nature and HLL coding means that this is far easier than CUDA to get close to theoretical max performance

[1] or any CSP with both input and output descheduling - ie no queueing