This document is relevant for: Inf2
, Trn1
, Trn1n
nki.isa.nc_transpose#
- nki.isa.nc_transpose(data, mask=None, dtype=None, engine=None, **kwargs)[source]#
Perform a 2D transpose between the partition axis and the free axis of input
data
, i.e., a PF-transpose, using Tensor or Vector Engine. If thedata
tile has more than one free axes, this API implicitly collapses all free axes into one axis and then performs a 2D PF-transpose.In NeuronCore, both Tensor and Vector Engine can perform a PF-transpose, but they support different input shapes. Tensor Engine
nc_transpose
can handle an input tile of shape (128, 128) or smaller, while Vector Engine can handle shape (32, 32) or smaller. Therefore, when the input tile shape is (32, 32) or smaller, we have an option to run it on either engine, which is controlled by theengine
field. If noengine
is specified, Neuron Compiler will automatically select an engine based on the input shape. Note, similar to other Tensor Engine instructions, the Tensor Enginenc_transpose
must read the input tile from SBUF and write the transposed result to PSUM. On the other hand, Vector Enginenc_transpose
can read/write from/to either SBUF or PSUM.Note, PF-transpose on Tensor Engine is done by performing a matrix multiplication between
data
as the stationary tensor and an identity matrix as the moving tensor. See architecture guide for more information. On NeuronCore-v2, such matmul-style transpose is not bit-accurate if the inputdata
contains NaN/Inf. You may consider replacing NaN/Inf with regular floats (float_max/float_min/zeros) in the input matrix before callingnc_transpose(engine=nki.isa.tensor_engine)
.Estimated instruction cost:
Vector Engine:
N
Vector Engine cycles, whereN
is the number of elements per partition indata
.Tensor Engine (assuming many back-to-back nc_transpose of the same shape on Tensor Engine):
max(P, min(64, F))
Tensor Engine cycles, whereP
is partition axis size ofdata
andF
is the number of elements per partition indata
.
- Parameters:
data – the input tile to be transposed
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – if specified and it’s different from the data type of input tile
data
, an additional nki.isa.cast instruction will be inserted to cast the transposed data into the targetdtype
(see Supported Data Types for more information)engine – specify which engine to use for transpose:
nki.isa.tensor_engine
ornki.isa.vector_engine
; by default, the best engine will be selected for the given input tile shape
- Returns:
a tile with transposed result of input
data
tile
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ... ################################################################## # Example 1: transpose tile a of shape (128, 64) ################################################################## i_p_a = nl.arange(128)[:, None] i_f_a = nl.arange(64)[None, :] aT = nisa.nc_transpose(a[i_p_a, i_f_a]) ################################################################## # Example 2: transpose tile b of shape (32, 2) using Vector Engine ################################################################## i_p_b = nl.arange(32)[:, None] i_f_b = nl.arange(2)[None, :] bT = nisa.nc_transpose(b[i_p_b, i_f_b], engine=nisa.vector_engine)
This document is relevant for: Inf2
, Trn1
, Trn1n