nki.isa.nc_transpose#

nki.isa.nc_transpose(dst, data, engine=engine.unknown, name=None)[source]#

Perform a 2D transpose between the partition axis and the free axis of input data using Tensor or Vector Engine.

If the data tile has more than one free axis, this API implicitly flattens all free axes into one axis and then performs a 2D transpose.

2D transpose on Tensor Engine is implemented by performing a matrix multiplication between data as the stationary tensor and an identity matrix as the moving tensor. This is equivalent to calling nisa.nc_matmul directly with is_transpose=True. See architecture guide for more information. On NeuronCore-v2, Tensor Engine transpose is not bit-accurate if the input data contains NaN/Inf. You may consider replacing NaN/Inf with regular floats (float_max/float_min/zeros) in the input matrix. Starting NeuronCore-v3, all Tensor Engine transpose is bit-accurate.

Memory types.

Tensor Engine nc_transpose must read the input tile from SBUF and write the transposed result to PSUM. Vector Engine nc_transpose can read/write from/to either SBUF or PSUM.

Data types.

The input data tile can be any valid NKI data type (see Supported Data Types for more information). The output dst tile must have the same data type as that of data.

Layout. The partition dimension of data tile becomes the free dimension of the dst tile. Similarly, the free dimension of the data tile becomes the partition dimension of the dst tile.

Tile size. Tensor Engine nc_transpose can handle an input tile of shape [128, 128] or smaller, while Vector Engine can handle shape [32, 32] or smaller. If no engine is specified, Neuron Compiler will automatically select an engine based on the input shape.

Parameters:
  • dst – the transpose output

  • data – the input tile to be transposed

  • engine – specify which engine to use for transpose: nki.isa.tensor_engine or nki.isa.vector_engine; by default, the best engine will be selected for the given input tile shape