In-place Operations¶ 就地操作¶

Contrary to MPI, NCCL does not define a special “in-place” value to replace pointers. Instead, NCCL optimizes the case where the provided pointers are effectively “in place”.
与 MPI 不同，NCCL 没有定义特殊的“就地”值来替换指针。相反，NCCL 优化了提供的指针实际上是“就地”的情况。

For ncclBroadcast, ncclReduce and ncclAllreduce functions, this means that passing sendBuff == recvBuff will perform in place operations, storing final results at the same place as initial data was read from.
对于 ncclBroadcast、ncclReduce 和 ncclAllreduce 函数，这意味着传递sendBuff == recvBuff将执行原地操作，将最终结果存储在初始数据读取的同一位置。

For ncclReduceScatter and ncclAllGather, in place operations are done when the per-rank pointer is located at the rank offset of the global buffer. More precisely, these calls are considered in place :
对于 ncclReduceScatter 和 ncclAllGather，当每个等级的指针位于全局缓冲区的等级偏移处时，会进行原地操作。更准确地说，这些调用被视为原地操作：

ncclReduceScatter(data, data+rank*recvcount, recvcount, datatype, op, comm, stream);
ncclAllGather(data+rank*sendcount, data, sendcount, datatype, op, comm, stream);