nki.collectives.collective_permute_implicit#

nki.collectives.collective_permute_implicit(srcs_by_channel, dsts_by_channel, replica_group, channel_ids=[0])[source]#

Send and receive data between ranks in a ring, where sources and destinations are implicitly determined by the ring structure during runtime.

Each rank sends data to its successor and receives from its predecessor in the ring. This differs from collective_permute() where users explicitly specify source-target pairs.

Since the sources and destinations are implicitly determined, use collective_permute_implicit_current_processing_rank_id() to get the rank ID whose data is currently being processed.

The outer dimension of srcs_by_channel and dsts_by_channel corresponds to channels. For each channel, the inner list contains exactly one tensor (coalesced collective communication is not currently supported).

Channels: Multiple channels enable overlapping communication, allowing concurrent data transfers. The number of available channels depends on the replica group and system connectivity (see Neuron Collectives). The maximum number of channels is 4 for replica groups containing all devices inside a node and 2 for other supported replica groups.

Parameters:
  • srcs_by_channel – List of source tensor lists, one per channel. Each inner list must contain exactly one tensor.

  • dsts_by_channel – List of destination tensor lists, one per channel. Each inner list must contain exactly one tensor.

  • replica_group – ReplicaGroup defining rank groups for the collective

  • channel_ids – List of channel IDs to use for communication (default [0] for single channel). Currently must be consecutive integers starting from 0.