nki.isa.sendrecv#

nki.isa.sendrecv(src, dst, send_to_rank, recv_from_rank, pipe_id, name=None)[source]#

Perform point-to-point communication between NeuronCores by sending and receiving data simultaneously using DMA engines.

Note

Available only on NeuronCore-v2 or newer.

This instruction enables bidirectional data exchange between two NeuronCores within a Logical NeuronCore (LNC) configuration. The current NeuronCore sends its src tile to the dst location of the target NeuronCore specified by send_to_rank, while simultaneously receiving data from recv_from_rank into its own dst tile.

The use case is when NeuronCores need to exchange data for distributed computation patterns, such as all-gather communication or other collective operations where cores need to coordinate their computations by exchanging tiles.

This instruction is only allowed in NeuronCore-v3 or newer when LNC (Logical NeuronCore) is enabled. The communication occurs between NeuronCores that share the same HBM stack within the LNC configuration. Therefore, send_to_rank and recv_from_rank must be either 0 or 1.

The pipe_id parameter provides synchronization control by grouping sendrecv operations. Operations with the same pipe_id form a logical group where all operations in the group must complete before any can proceed. Operations with different pipe_id values can progress independently without blocking each other.

Memory types.

Both src and dst tiles must be in SBUF.

Data types.

src and dst must have the same data type, but they can be any supported data types in NKI.

Layout.

src and dst must have the same shape and layout.

Tile size.

src and dst must have the same partition dimension size and the same number of elements per partition.

Parameters:
  • src – the source tile on the current NeuronCore to be sent to the target NeuronCore

  • dst – the destination tile on the current NeuronCore where received data will be stored

  • send_to_rank – rank ID of the target NeuronCore to send data to

  • recv_from_rank – rank ID of the source NeuronCore to receive data from

  • pipe_id – synchronization identifier that groups sendrecv operations; operations with the same pipe_id are synchronized

Example:

# Exchange data between two cores in a ring pattern
num_cores = 2
current_rank = nl.program_id()
next_rank = (current_rank + 1) % num_cores
prev_rank = (current_rank - 1) % num_cores

# Data to send and buffer to receive
send_data = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)
recv_buffer = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)

# Perform bidirectional exchange
sendrecv(
    src=send_data,
    dst=recv_buffer,
    send_to_rank=next_rank,
    recv_from_rank=prev_rank,
    pipe_id=0
)

# Now recv_buffer contains data from the previous core