nki.isa.dma_transpose#
- nki.isa.dma_transpose(dst, src, axes=None, dge_mode=dge_mode.unknown, name=None)[source]#
Perform a transpose on input
srcusing DMA Engine.The permutation of transpose follow the rules described below:
For 2-d input tile, the permutation will be [1, 0]
For 3-d input tile, the permutation will be [2, 1, 0]
For 4-d input tile, the permutation will be [3, 1, 2, 0]
The only valid
dge_modes areunknownandhwdge. Ifhwdge, this instruction will be lowered to a Hardware DGE transpose. This has additional restrictions:src.shape[0] == 16src.shape[-1] % 128 == 0dtypeis 2 bytes
- Parameters:
src – the source of transpose, must be a tile in HBM or SBUF.
axes – transpose axes where the i-th axis of the transposed tile will correspond to the axes[i] of the source. Supported axes are
(1, 0),(2, 1, 0), and(3, 1, 2, 0).dge_mode – (optional) specify which Descriptor Generation Engine (DGE) mode to use for DMA descriptor generation:
nki.isa.dge_mode.none(turn off DGE) ornki.isa.dge_mode.swdge(software DGE) ornki.isa.dge_mode.hwdge(hardware DGE) ornki.isa.dge_mode.unknown(by default, let compiler select the best DGE mode). Hardware based DGE is only supported for NeuronCore-v3 or newer. See Trainium2 arch guide for more information.