This document is relevant for: Inf2, Trn1, Trn1n

nki.isa#

NKI ISA#

nc_matmul

Compute stationary.T @ moving matrix multiplication using Tensor Engine.

nc_transpose

Perform a 2D transpose between the partition axis and the free axis of input data, i.e., a PF-transpose, using Tensor or Vector Engine.

activation

Apply an activation function on every element of the input tile using Scalar Engine.

tensor_reduce

Apply a reduction operation to the free axes of an input data tile using Vector Engine.

tensor_tensor

Perform an element-wise operation of input two tiles using Vector Engine.

tensor_tensor_scan

Perform a scan operation of two input tiles using Vector Engine.

tensor_scalar

Apply up to two math operators to the input data tile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar Engine: (data <op0> operand0) <op1> operand1.

reciprocal

Compute reciprocal of the input data tile.

iota

Build a constant literal in SBUF using GpSimd Engine, rather than transferring the constant literal values from the host to device.

dropout

Randomly replace some elements of the input tile data with zeros based on input probabilities using Vector Engine.

affine_select

Select elements between an input tile on_true_tile and a scalar value on_false_value according to a boolean predicate tile using GpSimd Engine.

memset

Initialize a tile filled with a compile-time constant value using Vector Engine.

bn_stats

Compute mean- and variance-related statistics for each partition of an input tile data in parallel using Vector Engine.

bn_aggr

Aggregate one or multiple bn_stats outputs to generate a mean and variance per partition using Vector Engine.

local_gather

Gather SBUF data in src_buffer using index on GpSimd Engine.

Engine Types#

tensor_engine

Tensor Engine

vector_engine

Vector Engine

scalar_engine

Scalar Engine

gpsimd_engine

GpSIMD Engine

dma_engine

DMA Engine

unknown_engine

Unknown Engine

This document is relevant for: Inf2, Trn1, Trn1n