Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> even more if not using Infiniband

It's interesting that the above "HPC reference architecture" shows a GPU-to-GPU Infiniband fabric, despite Nvidia also nominally pushing NVLink Switch (https://www.nvidia.com/en-us/data-center/nvlink/) for the HPC use-case.



How does NVLink work? Because I already know MPI and I’m not going to learn anything else, lol.

Edit: after googling it looks like OpenMPI has some NVLink support, so maybe it is OK.


There is "CUDA-aware MPI" which would let you RDMA from device to device. But the more modern way would be MPI for the host communication and their own library NCCL for the device communication. NCCL has similar collective functions a MPI but runs on the device which makes it much more efficient to integrate in the flow of your kernels. But you would still generally bootstrap your processes and data through MPI.


The naming games on the libraries are rather entertaining... NCCL, RCCL... and oneAPI (oneCCL).


If you use ucx it does all that automatically without you choosing.


I use OpenMPI with no issues over multiple H100 nodes and A100 nodes, with multiple infiniband 200G and ethernet 100G/200G networks, and RDMA (though using mellanox instead of broadcom cards, but afaik broadcom supports this just the same). Side note, make sure you compile nvidia_peermem correctly if you want GDRMA to work :)


No issues, except this minor bit of arcane knowledge that is missing from SO. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: