In-place Operations¶ 就地操作¶
Contrary to MPI, NCCL does not define a special “in-place” value to replace pointers. Instead, NCCL optimizes the case where the provided pointers are effectively “in place”.
与 MPI 不同,NCCL 没有定义特殊的“就地”值来替换指针。相反,NCCL 优化了提供的指针实际上是“就地”的情况。
For ncclBroadcast, ncclReduce and ncclAllreduce functions, this means that passing sendBuff == recvBuff
will perform in place operations,
storing final results at the same place as initial data was read from.
对于 ncclBroadcast、ncclReduce 和 ncclAllreduce 函数,这意味着传递sendBuff == recvBuff
将执行原地操作,将最终结果存储在初始数据读取的同一位置。
For ncclReduceScatter and ncclAllGather, in place operations are done when the per-rank pointer is located at the rank offset of the global buffer.
More precisely, these calls are considered in place :
对于 ncclReduceScatter 和 ncclAllGather,当每个等级的指针位于全局缓冲区的等级偏移处时,会进行原地操作。更准确地说,这些调用被视为原地操作:
ncclReduceScatter(data, data+rank*recvcount, recvcount, datatype, op, comm, stream);
ncclAllGather(data+rank*sendcount, data, sendcount, datatype, op, comm, stream);