nki.isa.dma_transpose#

nki.isa.dma_transpose(dst, src, axes=None, dge_mode=dge_mode.unknown, oob_mode=oob_mode.error, name=None)[source]#

Perform a transpose on input src using DMA Engine.

The permutation of transpose follow the rules described below:

For 2-d input tile, the permutation will be [1, 0]
For 3-d input tile, the permutation will be [2, 1, 0]
For 4-d input tile, the permutation will be [3, 1, 2, 0]

DMA Transpose Constraints

The only valid dge_mode s are unknown and hwdge. If hwdge, this instruction will be lowered to a Hardware DGE transpose. This has additional restrictions:

src.shape[0] == 16
src.shape[-1] % 128 == 0
dtype is 2 bytes

DMA Indirect Transpose Constraints

The only valid dge_mode s are unknown and swdge. If swdge, this instruction will be lowered to a Software DGE transpose. This has additional restrictions:

src is a 3-d tile
src.shape[-1] == 128
src.dtype is 2 bytes
indices.shape[1] == 1
indices.shape[0] % 16 == 0
indices.dtype is np.uint32

Parameters:

src – the source of transpose, must be a tile in HBM or SBUF.
axes – transpose axes where the i-th axis of the transposed tile will correspond to the axes[i] of the source. Supported axes are (1, 0), (2, 1, 0), and (3, 1, 2, 0).
dge_mode – (optional) specify which Descriptor Generation Engine (DGE) mode to use for DMA descriptor generation: nki.isa.dge_mode.none (turn off DGE) or nki.isa.dge_mode.swdge (software DGE) or nki.isa.dge_mode.hwdge (hardware DGE) or nki.isa.dge_mode.unknown (by default, let compiler select the best DGE mode). Hardware based DGE is only supported for NeuronCore-v3 or newer. See Trainium2 arch guide for more information.
oob_mode –
(optional) Specifies how to handle out-of-bounds (oob) array indices during indirect access operations. Valid modes are:
- oob_mode.error: (Default) Raises an error when encountering out-of-bounds indices.
- oob_mode.skip: Silently skips any operations involving out-of-bounds indices.
For example, when using indirect gather/scatter operations, out-of-bounds indices can occur if the index array contains values that exceed the dimensions of the target array.

nki.isa.dma_transpose

Contents

nki.isa.dma_transpose#