There is "CUDA-aware MPI" which would let you RDMA from device to device. But th... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kcb on July 13, 2024 \| parent \| context \| favorite \| on: So you want to rent an NVIDIA H100 cluster? 2024 C... There is "CUDA-aware MPI" which would let you RDMA from device to device. But the more modern way would be MPI for the host communication and their own library NCCL for the device communication. NCCL has similar collective functions a MPI but runs on the device which makes it much more efficient to integrate in the flow of your kernels. But you would still generally bootstrap your processes and data through MPI.

latchkey on July 13, 2024 | [–]

The naming games on the libraries are rather entertaining... NCCL, RCCL... and oneAPI (oneCCL).

shaklee3 on July 13, 2024 | [–]

If you use ucx it does all that automatically without you choosing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact