nki.isa.dma_copy#
- nki.isa.dma_copy(dst, src, dst_rmw_op=None, oob_mode=oob_mode.error, dge_mode=dge_mode.unknown, name=None)[source]#
Copy data from
srctodstusing DMA engines with optional read-modify-write operations.This instruction performs data movement between memory locations (SBUF or HBM) using DMA engines. The basic operation copies data from the source tensor to the destination tensor:
dst = src. Optionally, a read-modify-write operation can be performed where the source data is combined with existing destination data using a specified operation:dst = dst_rmw_op(dst, src).Currently, only
np.addis supported fordst_rmw_opwhen performing read-modify-write operations. Whendst_rmw_op=None, the source data directly overwrites the destination data.nisa.dma_copysupports different modes of DMA descritpor generation (DGE):nisa.dge_mode.none: Neuron Runtime generates DMA descriptors and stores them into HBM before NEFF execution.nisa.dge_mode.swdge: Gpsimd Engine generates DMA descriptors as part of thenisa.dma_copyinstruction during NEFF execution.nisa.dge_mode.hwdge: Sync Engine or Scalar Engine sequencers invoke DGE hardware block to generate DMA descriptors as part of thenisa.dma_copyinstruction during NEFF execution.
See Trainium2 arch guide and Introduction to DMA with NKI for more discussion.
When either
sw_dgeorhw_dgemode is used, thesrcanddsttensors can have a dynamic start address which depends on a variable that cannot be resolved at compile time. Whensw_dgeis selected,nisa.dma_copycan also perform a gather or scatter operation, using a list of unique dynamic indices from SBUF. In both of these dynamic modes, out-of-bound address checking is turned on automatically during execution. By default a runtime error is raised (oob_mode=oob_mode.erroras default setting). Developers can disable this error and make the nisa.dma_copy instruction skips the DMA transfer for a given dynamic address or index when it is out of bound usingoob_mode=oob_mode.skip. Ifdst_rmw_opis specified for these dynamic modes, onlyoob_mode.erroris allowed. See Beta2 NKI kernel migration guide for the latest syntax to handle dynamic addresses or indices.nisa.dma_copyalso supports non-unique scatter indices whendge_mode=nisa.dge_mode.noneanddst_rmw_op=nl.addare set. An example use case for this is performing embedding table entry updates after a training backward pass to calcualte embedding table gradients.Memory types.
Both
srcanddsttiles can be in HBM or SBUF. However, if both tiles are in SBUF, consider using nisa.tensor_copy instead for better performance.Data types.
Both
srcanddsttiles can be any supported NKI data types (see Supported Data Types for more information).The DMA engines automatically handle data type conversion when
srcanddsthave different data types. The conversion is performed through a two-step process: first casting fromsrc.dtypeto float32, then from float32 todst.dtype.If
dst_rmw_opis used, the DMA engines automatically cast input data types to float32 before performing the read-modify-write computation, and the final float32 result is cast to the output data type in a pipelined fashion.Layout.
If
dst_rmw_opis used, the computation is done element-wise betweensrcand dst.Tile size.
The total number of data elements in
srcmust match that ofdst.- Parameters:
dst – the destination tensor to copy data into
src – the source tensor to copy data from
dst_rmw_op – optional read-modify-write operation (currently only
np.addis supported)dge_mode – (optional) specify which Descriptor Generation Engine (DGE) mode to use for DMA descriptor generation:
nki.isa.dge_mode.none(turn off DGE) ornki.isa.dge_mode.swdge(software DGE) ornki.isa.dge_mode.hwdge(hardware DGE) ornki.isa.dge_mode.unknown(by default, let compiler select the best DGE mode). Hardware based DGE is only supported for NeuronCore-v3 or newer. See Trainium2 arch guide for more information.oob_mode –
(optional) Specifies how to handle out-of-bounds (oob) array indices during indirect access operations. Valid modes are:
oob_mode.error: (Default) Raises an error when encountering out-of-bounds indices.oob_mode.skip: Silently skips any operations involving out-of-bounds indices.
For example, when using indirect gather/scatter operations, out-of-bounds indices can occur if the index array contains values that exceed the dimensions of the target array.