nki.collectives.collective_permute_implicit_current_processing_rank_id#

nki.collectives.collective_permute_implicit_current_processing_rank_id(iteration_id, replica_group, channel_id=0, num_channels=1)[source]#

Returns the rank ID of the data to be processed in the current ring iteration.

This function is intended to be used in conjunction with collective_permute_implicit() or collective_permute_implicit_reduce(). Since the sources and destinations are implicitly determined in ring algorithms, the rank ID of received data can only be determined at runtime.

At iteration 0, this returns the current rank’s own ID (processing local data). In subsequent iterations, it returns the rank ID of data received from predecessors, progressing around the ring.

The returned rank ID is a scalar register. To determine the offset of the received data chunk within a tensor, use register ALU operations (e.g., multiply the rank ID by chunk size), then use dynamic access pattern (tensor.ap()) in ISA compute operations (e.g., nisa.nc_matmul()).

Typical usage pattern: In each iteration of a ring algorithm, the compute kernel uses this function to identify which rank’s data is being processed, computes on that data while concurrently triggering the next communication step to send already-computed chunks to the successor.

Channels: Multiple channels enable overlapping communication, allowing concurrent data transfers. The number of available channels depends on the replica group and system connectivity (see Neuron Collectives). The maximum number of channels is 4 for replica groups containing all devices inside a node and 2 for other supported replica groups.

Parameters:
  • iteration_id – Current ring step (typically the loop counter).

  • channel_id – Channel ID for the communication (0 to num_channels-1)

  • num_channels – Total number of channels (use 1 for single-channel)

  • replica_group – ReplicaGroup defining the ring topology

Returns:

Scalar register containing the rank ID of the data to be processed