nki.collectives.collective_permute_implicit_reduce#
- nki.collectives.collective_permute_implicit_reduce(srcs0_by_channel, srcs1_by_channel, dsts_by_channel, replica_group, op, channel_ids=[0])[source]#
Perform an implicit collective permute with reduction in a ring, where sources and destinations are implicitly determined by the ring structure during runtime.
Combines
collective_permute_implicit()with a reduction operation. Each rank reduces its local sources usingop(srcs0_by_channel[i], srcs1_by_channel[i]), sends the result to its successor, and receives its predecessor’s reduced result intodsts_by_channel[i].Since the sources and destinations are implicitly determined, use
collective_permute_implicit_current_processing_rank_id()to get the rank ID whose data is currently being processed.The outer dimension of
srcs0_by_channel,srcs1_by_channel, anddsts_by_channelcorresponds to channels. For each channel, the inner list contains exactly one tensor (coalesced collective communication is not currently supported).Channels: Multiple channels enable overlapping communication, allowing concurrent data transfers. The number of available channels depends on the replica group and system connectivity (see Neuron Collectives). The maximum number of channels is 4 for replica groups containing all devices inside a node and 2 for other supported replica groups.
- Parameters:
srcs0_by_channel – List of source tensor lists (left operand of reduction), one per channel. Each inner list must contain exactly one tensor.
srcs1_by_channel – List of source tensor lists (right operand of reduction), one per channel. Each inner list must contain exactly one tensor.
dsts_by_channel – List of destination tensor lists to receive predecessor’s reduced result, one per channel. Each inner list must contain exactly one tensor.
replica_group – ReplicaGroup defining rank groups for the collective
op – The reduction operation to perform (
nl.add,nl.minimum, ornl.maximum)channel_ids – List of channel IDs to use for communication (default [0] for single channel). Currently must be consecutive integers starting from 0.