nki.isa.sendrecv#
- nki.isa.sendrecv(src, dst, send_to_rank, recv_from_rank, pipe_id, name=None)[source]#
Perform point-to-point communication between NeuronCores by sending and receiving data simultaneously using DMA engines.
Note
Available only on NeuronCore-v2 or newer.
This instruction enables bidirectional data exchange between two NeuronCores within a Logical NeuronCore (LNC) configuration. The current NeuronCore sends its
srctile to thedstlocation of the target NeuronCore specified bysend_to_rank, while simultaneously receiving data fromrecv_from_rankinto its owndsttile.The use case is when NeuronCores need to exchange data for distributed computation patterns, such as all-gather communication or other collective operations where cores need to coordinate their computations by exchanging tiles.
This instruction is only allowed in NeuronCore-v3 or newer when LNC (Logical NeuronCore) is enabled. The communication occurs between NeuronCores that share the same HBM stack within the LNC configuration. Therefore,
send_to_rankandrecv_from_rankmust be either 0 or 1.The
pipe_idparameter provides synchronization control by grouping sendrecv operations. Operations with the samepipe_idform a logical group where all operations in the group must complete before any can proceed. Operations with differentpipe_idvalues can progress independently without blocking each other.Memory types.
Both
srcanddsttiles must be in SBUF.Data types.
srcanddstmust have the same data type, but they can be any supported data types in NKI.Layout.
srcanddstmust have the same shape and layout.Tile size.
srcanddstmust have the same partition dimension size and the same number of elements per partition.- Parameters:
src – the source tile on the current NeuronCore to be sent to the target NeuronCore
dst – the destination tile on the current NeuronCore where received data will be stored
send_to_rank – rank ID of the target NeuronCore to send data to
recv_from_rank – rank ID of the source NeuronCore to receive data from
pipe_id – synchronization identifier that groups sendrecv operations; operations with the same pipe_id are synchronized
Example:
# Exchange data between two cores in a ring pattern num_cores = 2 current_rank = nl.program_id() next_rank = (current_rank + 1) % num_cores prev_rank = (current_rank - 1) % num_cores # Data to send and buffer to receive send_data = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf) recv_buffer = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf) # Perform bidirectional exchange sendrecv( src=send_data, dst=recv_buffer, send_to_rank=next_rank, recv_from_rank=prev_rank, pipe_id=0 ) # Now recv_buffer contains data from the previous core