This document is relevant for: Trn2, Trn3
nki.isa.sendrecv#
- nki.isa.sendrecv(src, dst, send_to_rank, recv_from_rank, pipe_id, dma_engine=dma_engine.dma, name=None)[source]#
Perform point-to-point communication between NeuronCores by sending and receiving data simultaneously using DMA engines.
Note
Available only on NeuronCore-v3 or newer.
This instruction enables bidirectional data exchange between two NeuronCores within a Logical NeuronCore (LNC) configuration. The current NeuronCore sends its
srctile to thedstlocation of the target NeuronCore specified bysend_to_rank, while simultaneously receiving data fromrecv_from_rankinto its owndsttile.The use case is when NeuronCores need to exchange data for distributed computation patterns, such as all-gather communication or other collective operations where cores need to coordinate their computations by exchanging tiles.
This instruction is only allowed in NeuronCore-v3 or newer when LNC (Logical NeuronCore) is enabled. The communication occurs between NeuronCores that share the same HBM stack within the LNC configuration. Therefore,
send_to_rankandrecv_from_rankmust be either 0 or 1.The
pipe_idparameter provides synchronization control by grouping sendrecv operations. Operations with the samepipe_idform a logical group where all operations in the group must complete before any can proceed. Operations with differentpipe_idvalues can progress independently without blocking each other.The
dma_engineparameter specifies which DMA transfer mechanism to use:nisa.dma_engine.dma(default): Uses the standard DMA engine with CoreBarrier synchronization. Can be triggered from any engine.nisa.dma_engine.gpsimd_dma: Uses the GPSIMD’s internal DMA engine for low-latency SB-to-SB swaps in LNC=2. Implies GPSIMD as the trigger engine. This mode has restrictions: the partition dimension size ofsrc/dstmust be a multiple of 16, and the data size per partition must not exceed 1024 bytes for 32-bit types, 512 bytes for 16-bit types, or 256 bytes for 8-bit types.
Memory types.
Both
srcanddsttiles must be in SBUF.Data types.
srcanddstmust have the same data type, but they can be any supported data types in NKI.Layout.
srcanddstmust have the same shape and layout.Tile size.
srcanddstmust have the same partition dimension size and the same number of elements per partition.- Parameters:
src – the source tile on the current NeuronCore to be sent to the target NeuronCore
dst – the destination tile on the current NeuronCore where received data will be stored
send_to_rank – rank ID of the target NeuronCore to send data to
recv_from_rank – rank ID of the source NeuronCore to receive data from
pipe_id – synchronization identifier that groups sendrecv operations; operations with the same pipe_id are synchronized
dma_engine – the DMA transfer mode; defaults to
nisa.dma_engine.dma
Example:
# Exchange data between two cores in a ring pattern num_cores = 2 current_rank = nl.program_id() next_rank = (current_rank + 1) % num_cores prev_rank = (current_rank - 1) % num_cores # Data to send and buffer to receive send_data = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf) recv_buffer = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf) # Perform bidirectional exchange sendrecv( src=send_data, dst=recv_buffer, send_to_rank=next_rank, recv_from_rank=prev_rank, pipe_id=0 ) # Now recv_buffer contains data from the previous core
This document is relevant for: Trn2, Trn3