This document is relevant for: Trn2, Trn3

nki.isa.sendrecv#

nki.isa.sendrecv(src, dst, send_to_rank, recv_from_rank, pipe_id, dma_engine=dma_engine.dma, name=None)[source]#

Perform point-to-point communication between NeuronCores by sending and receiving data simultaneously using DMA engines.

Note

Available only on NeuronCore-v3 or newer.

This instruction enables bidirectional data exchange between two NeuronCores within a Logical NeuronCore (LNC) configuration. The current NeuronCore sends its src tile to the dst location of the target NeuronCore specified by send_to_rank, while simultaneously receiving data from recv_from_rank into its own dst tile.

The use case is when NeuronCores need to exchange data for distributed computation patterns, such as all-gather communication or other collective operations where cores need to coordinate their computations by exchanging tiles.

This instruction is only allowed in NeuronCore-v3 or newer when LNC (Logical NeuronCore) is enabled. The communication occurs between NeuronCores that share the same HBM stack within the LNC configuration. Therefore, send_to_rank and recv_from_rank must be either 0 or 1.

The pipe_id parameter provides synchronization control by grouping sendrecv operations. Operations with the same pipe_id form a logical group where all operations in the group must complete before any can proceed. Operations with different pipe_id values can progress independently without blocking each other.

The dma_engine parameter specifies which DMA transfer mechanism to use:

  • nisa.dma_engine.dma (default): Uses the standard DMA engine with CoreBarrier synchronization. Can be triggered from any engine.

  • nisa.dma_engine.gpsimd_dma: Uses the GPSIMD’s internal DMA engine for low-latency SB-to-SB swaps in LNC=2. Implies GPSIMD as the trigger engine. This mode restricts the data size per partition to not exceed:

    • 1024 bytes for 32-bit types

    • 512 bytes for 16-bit types

    • 256 bytes for 8-bit types

Constraints.

  • src and dst tiles must both be in SBUF.

  • src and dst must have the same data type, but they can be any supported data types in NKI.

  • src and dst must have the same shape and layout.

  • src and dst must have the same partition dimension size and the same number of elements per partition.

Parameters:
  • src – the source tile on the current NeuronCore to be sent to the target NeuronCore

  • dst – the destination tile on the current NeuronCore where received data will be stored

  • send_to_rank – rank ID of the target NeuronCore to send data to

  • recv_from_rank – rank ID of the source NeuronCore to receive data from

  • pipe_id – synchronization identifier that groups sendrecv operations; operations with the same pipe_id are synchronized

  • dma_engine – the DMA transfer mode; defaults to nisa.dma_engine.dma

Example:

# Exchange data between two cores in a ring pattern
num_cores = 2
current_rank = nl.program_id()
next_rank = (current_rank + 1) % num_cores
prev_rank = (current_rank - 1) % num_cores

# Data to send and buffer to receive
send_data = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)
recv_buffer = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)

# Perform bidirectional exchange
sendrecv(
    src=send_data,
    dst=recv_buffer,
    send_to_rank=next_rank,
    recv_from_rank=prev_rank,
    pipe_id=0
)

# Now recv_buffer contains data from the previous core

This document is relevant for: Trn2, Trn3