nki.isa.nc_transpose#
- nki.isa.nc_transpose(dst, data, engine=engine.unknown, name=None)[source]#
Perform a 2D transpose between the partition axis and the free axis of input
datausing Tensor or Vector Engine.If the
datatile has more than one free axis, this API implicitly flattens all free axes into one axis and then performs a 2D transpose.2D transpose on Tensor Engine is implemented by performing a matrix multiplication between
dataas the stationary tensor and an identity matrix as the moving tensor. This is equivalent to callingnisa.nc_matmuldirectly withis_transpose=True. See architecture guide for more information. On NeuronCore-v2, Tensor Engine transpose is not bit-accurate if the inputdatacontains NaN/Inf. You may consider replacing NaN/Inf with regular floats (float_max/float_min/zeros) in the input matrix. Starting NeuronCore-v3, all Tensor Engine transpose is bit-accurate.Memory types.
Tensor Engine
nc_transposemust read the input tile from SBUF and write the transposed result to PSUM. Vector Enginenc_transposecan read/write from/to either SBUF or PSUM.Data types.
The input
datatile can be any valid NKI data type (see Supported Data Types for more information). The outputdsttile must have the same data type as that ofdata.Layout. The partition dimension of
datatile becomes the free dimension of thedsttile. Similarly, the free dimension of thedatatile becomes the partition dimension of thedsttile.Tile size. Tensor Engine
nc_transposecan handle an input tile of shape [128, 128] or smaller, while Vector Engine can handle shape [32, 32] or smaller. If noengineis specified, Neuron Compiler will automatically select an engine based on the input shape.- Parameters:
dst – the transpose output
data – the input tile to be transposed
engine – specify which engine to use for transpose:
nki.isa.tensor_engineornki.isa.vector_engine; by default, the best engine will be selected for the given input tile shape