This document is relevant for: Inf2, Trn1, Trn2

nki.isa.dma_transpose#

nki.isa.dma_transpose(src, *, axes=None, mask=None, dtype=None, **kwargs)[source]#

Perform a transpose on input src using DMA Engine.

The permutation of transpose follow the rules described below:

  1. For 2-d input tile, the permutation will be [1, 0]

  2. For 3-d input tile, the permutation will be [2, 1, 0]

  3. For 4-d input tile, the permutation will be [3, 1, 2, 0]

Parameters:
  • src – the source of transpose, must be a tile in HBM or SBUF.

  • axes – transpose axes where the i-th axis of the transposed tile will correspond to the axes[i] of the source. Supported axes are (1, 0), (2, 1, 0), and (3, 1, 2, 0).

  • mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)

  • dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.

  • dge_mode – (optional) specify which Descriptor Generation Engine (DGE) mode to use for copy: nki.isa.dge_mode.none (turn off DGE) or nki.isa.dge_mode.swdge (software DGE) or nki.isa.dge_mode.hwdge (hardware DGE) or nki.isa.dge_mode.unknown (by default, let compiler select the best DGE mode). HWDGE is only supported for NeuronCore-v3+.

Returns:

a tile with transposed content

Example:

import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl

############################################################################
# Example 1: Simple 2D transpose (HBM->SB)
############################################################################
def nki_dma_transpose_2d_hbm2sb(a):
  b = nisa.dma_transpose(a)
  return b
import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl

############################################################################
# Example 2: Simple 2D transpose (SB->SB)
############################################################################
def nki_dma_transpose_2d_sb2sb(a):
  a_sb = nl.load(a)
  b = nisa.dma_transpose(a_sb)
  return b

This document is relevant for: Inf2, Trn1, Trn2