This document is relevant for: Inf2, Trn1, Trn2

nki.isa#

NKI ISA#

nc_matmul

Compute stationary.T @ moving matrix multiplication using Tensor Engine.

nc_transpose

Perform a 2D transpose between the partition axis and the free axis of input data, i.e., a PF-transpose, using Tensor or Vector Engine.

activation

Apply an activation function on every element of the input tile using Scalar Engine.

activation_reduce

Perform the same computation as nisa.activation and also a reduction along the free dimension of the nisa.activation result using Scalar Engine.

tensor_reduce

Apply a reduction operation to the free axes of an input data tile using Vector Engine.

tensor_partition_reduce

Apply a reduction operation across partitions of an input data tile using GpSimd Engine.

tensor_tensor

Perform an element-wise operation of input two tiles using Vector Engine or GpSimd Engine.

tensor_tensor_scan

Perform a scan operation of two input tiles using Vector Engine.

scalar_tensor_tensor

Apply up to two math operators using Vector Engine: (data <op0> operand0) <op1> operand1.

tensor_scalar

Apply up to two math operators to the input data tile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar Engine: (data <op0> operand0) <op1> operand1.

tensor_scalar_reduce

Perform the same computation as nisa.tensor_scalar with one math operator and also a reduction along the free dimension of the nisa.tensor_scalar result using Vector Engine.

tensor_copy

Create a copy of the source tile within SBUF/PSUM using Vector, Scalar or GpSimd Engine.

tensor_copy_dynamic_src

Create a copy of src tile within NeuronCore on-chip SRAMs using Vector Engine, with src located at a dynamic offset within each partition.

reciprocal

Compute reciprocal of each element in the input data tile using Scalar Engine or Vector Engine.

iota

Build a constant literal in SBUF using GpSimd Engine, rather than transferring the constant literal values from the host to device.

dropout

Randomly replace some elements of the input tile data with zeros based on input probabilities using Vector Engine.

affine_select

Select elements between an input tile on_true_tile and a scalar value on_false_value according to a boolean predicate tile using GpSimd Engine.

memset

Initialize a tile filled with a compile-time constant value using Vector Engine.

bn_stats

Compute mean- and variance-related statistics for each partition of an input tile data in parallel using Vector Engine.

bn_aggr

Aggregate one or multiple bn_stats outputs to generate a mean and variance per partition using Vector Engine.

local_gather

Gather SBUF data in src_buffer using index on GpSimd Engine.

dma_copy

Copy data from src to dst using DMA engine.

Engine Types#

tensor_engine

Tensor Engine

vector_engine

Vector Engine

scalar_engine

Scalar Engine

gpsimd_engine

GpSimd Engine

dma_engine

DMA Engine

unknown_engine

Unknown Engine

Target#

nc_version

NeuronCore version

get_nc_version

Returns the nc_version of the current target context.

This document is relevant for: Inf2, Trn1, Trn2