This document is relevant for: Inf2, Trn1, Trn2
nki.isa.dma_copy#
- nki.isa.dma_copy(*, dst, src, mask=None, dst_rmw_op=None, oob_mode=oob_mode.error, dge_mode=dge_mode.unknown)[source]#
Copy data from
srctodstusing DMA engine. Bothsrcanddsttiles can be in device memory (HBM) or SBUF. However, if bothsrcanddsttiles are in SBUF, consider using nisa.tensor_copy instead for better performance.- Parameters:
src – the source of copy.
dst – the dst of copy.
dst_rmw_op – the read-modify-write operation to be performed at the destination. Currently only
np.addis supported, which adds the source data to the existing destination data. IfNone, the source data directly overwrites the destination. Ifdst_rmw_opis specified, onlyoob_mode=oob_mode.erroris allowed. For best performance with Descriptor Generation Engine (DGE), unique dynamic offsets must be used to accessdst. Multiple accesses to the same offset will cause a data hazard. If duplicated offsets are present, the compiler automatically adds synchronization to avoid hazards, which slows down computation.mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
mode –
(optional) Specifies how to handle out-of-bounds (oob) array indices during indirect access operations. Valid modes are:
oob_mode.error: (Default) Raises an error when encountering out-of-bounds indices.oob_mode.skip: Silently skips any operations involving out-of-bounds indices.
For example, when using indirect gather/scatter operations, out-of-bounds indices can occur if the index array contains values that exceed the dimensions of the target array.
dge_mode – (optional) specify which Descriptor Generation Engine (DGE) mode to use for copy:
nki.isa.dge_mode.none(turn off DGE) ornki.isa.dge_mode.swdge(software DGE) ornki.isa.dge_mode.hwdge(hardware DGE) ornki.isa.dge_mode.unknown(by default, let compiler select the best DGE mode). HWDGE is only supported for NeuronCore-v3+.
A cast will happen if the
srcanddsthave different dtype.Example:
import neuronxcc.nki.isa as nisa ############################################################################ # Example 1: Copy over the tensor to another tensor ############################################################################ nisa.dma_copy(dst=b, src=a)
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ############################################################################ # Example 2: Load elements from HBM with indirect addressing. If addressing # results out-of-bound access, the operation will fail. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n//2, 0:m] expr_arange = 2*nl.arange(n//2)[:, None] idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype) nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ############################################################################ # Example 3: Load elements from HBM with indirect addressing. If addressing # results in out-of-bounds access, the operation will fail. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n//2, 0:m] # indices are out of range on purpose to demonstrate the error expr_arange = 3*nl.arange(n//2)[:, None] idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype) nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ############################################################################ # Example 4: Load elements from HBM with indirect addressing. If addressing # results in out-of-bounds access, the operation will skip indices. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n//2, 0:m] # indices are out of range on purpose expr_arange = 3*nl.arange(n//2)[:, None] idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype) nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.skip)
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ############################################################################ # Example 5: Store elements to HBM with indirect addressing and with # read-modifed-write operation. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m] expr_arange = 2*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, dst_rmw_op=np.add)
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ############################################################################ # Example 6: Store elements to HBM with indirect addressing. If indirect # addressing results out-of-bound access, the operation will fail. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m] expr_arange = 2*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.error)
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ############################################################################ # Example 7: Store elements to HBM with indirect addressing. If indirect # addressing results out-of-bounds access, the operation will skip indices. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m] # indices are out of range on purpose to demonstrate the error expr_arange = 3*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.error)
############################################################################ # Example 8: Store elements to HBM with indirect addressing. If indirect # addressing results out-of-bounds access, the operation will skip indices. ############################################################################ ... n, m = in_tensor.shape ix, iy = nl.mgrid[0:n, 0:m] # indices are out of range on purpose expr_arange = 3*nl.arange(n)[:, None] inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy]) idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32) out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype) nl.store(out_tensor, value=out_tile) nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.skip)
This document is relevant for: Inf2, Trn1, Trn2