This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.dma_transpose#
- nki.isa.dma_transpose(src, *, axes=None, mask=None, dtype=None, **kwargs)[source]#
Perform a transpose on input
src
using DMA Engine.The permutation of transpose follow the rules described below:
For 2-d input tile, the permutation will be [1, 0]
For 3-d input tile, the permutation will be [2, 1, 0]
For 4-d input tile, the permutation will be [3, 1, 2, 0]
- Parameters:
src – the source of transpose, must be a tile in HBM or SBUF.
axes – transpose axes where the i-th axis of the transposed tile will correspond to the axes[i] of the source. Supported axes are
(1, 0)
,(2, 1, 0)
, and(3, 1, 2, 0)
.mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
dge_mode – (optional) specify which Descriptor Generation Engine (DGE) mode to use for copy:
nki.isa.dge_mode.none
(turn off DGE) ornki.isa.dge_mode.swdge
(software DGE) ornki.isa.dge_mode.hwdge
(hardware DGE) ornki.isa.dge_mode.unknown
(by default, let compiler select the best DGE mode). HWDGE is only supported for NeuronCore-v3+.
- Returns:
a tile with transposed content
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ############################################################################ # Example 1: Simple 2D transpose (HBM->SB) ############################################################################ def nki_dma_transpose_2d_hbm2sb(a): b = nisa.dma_transpose(a) return b
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ############################################################################ # Example 2: Simple 2D transpose (SB->SB) ############################################################################ def nki_dma_transpose_2d_sb2sb(a): a_sb = nl.load(a) b = nisa.dma_transpose(a_sb) return b
This document is relevant for: Inf2
, Trn1
, Trn2