This document is relevant for: Inf2, Trn1, Trn2

nki.isa#

NKI ISA#

nc_matmul

Compute stationary.T @ moving matrix multiplication using Tensor Engine.

nc_transpose

Perform a 2D transpose between the partition axis and the free axis of input data, i.e., a PF-transpose, using Tensor or Vector Engine.

activation

Apply an activation function on every element of the input tile using Scalar Engine.

activation_reduce

Perform the same computation as nisa.activation and also a reduction along the free dimension of the nisa.activation result using Scalar Engine.

tensor_reduce

Apply a reduction operation to the free axes of an input data tile using Vector Engine.

tensor_partition_reduce

Apply a reduction operation across partitions of an input data tile using GpSimd Engine.

tensor_tensor

Perform an element-wise operation of input two tiles using Vector Engine or GpSimd Engine.

tensor_tensor_scan

Perform a scan operation of two input tiles using Vector Engine.

scalar_tensor_tensor

Apply up to two math operators using Vector Engine: (data <op0> operand0) <op1> operand1.

tensor_scalar

Apply up to two math operators to the input data tile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar or GpSimd Engine: (data <op0> operand0) <op1> operand1.

tensor_scalar_reduce

Perform the same computation as nisa.tensor_scalar with one math operator and also a reduction along the free dimension of the nisa.tensor_scalar result using Vector Engine.

tensor_copy

Create a copy of src tile within NeuronCore on-chip SRAMs using Vector, Scalar or GpSimd Engine.

tensor_copy_dynamic_src

Create a copy of src tile within NeuronCore on-chip SRAMs using Vector Engine, with src located at a dynamic offset within each partition.

tensor_copy_dynamic_dst

Create a copy of src tile within NeuronCore on-chip SRAMs using Vector Engine, with dst located at a dynamic offset within each partition.

tensor_copy_predicated

Conditionally copy elements from the src tile to the destination tile on SBUF / PSUM based on a predicate using Vector Engine.

reciprocal

Compute reciprocal of each element in the input data tile using Vector Engine.

iota

Build a constant literal in SBUF using GpSimd Engine, rather than transferring the constant literal values from the host to device.

dropout

Randomly replace some elements of the input tile data with zeros based on input probabilities using Vector Engine.

affine_select

Select elements between an input tile on_true_tile and a scalar value on_false_value according to a boolean predicate tile using GpSimd Engine.

memset

Initialize a tile filled with a compile-time constant value using Vector or GpSimd Engine.

bn_stats

Compute mean- and variance-related statistics for each partition of an input tile data in parallel using Vector Engine.

bn_aggr

Aggregate one or multiple bn_stats outputs to generate a mean and variance per partition using Vector Engine.

local_gather

Gather SBUF data in src_buffer using index on GpSimd Engine.

dma_copy

Copy data from src to dst using DMA engine.

max8

Find the 8 largest values in each partition of the source tile.

nc_find_index8

Find indices of the 8 given vals in each partition of the data tensor.

nc_match_replace8

Replace first occurrence of each value in vals with imm in data using the Vector engine.

nc_stream_shuffle

Apply cross-partition data movement within a quadrant of 32 partitions from source tile src to destination tile dst using Vector Engine.

Accumulation Command#

reduce_cmd

Engine Register Reduce commands

Engine Types#

tensor_engine

Tensor Engine

vector_engine

Vector Engine

scalar_engine

Scalar Engine

gpsimd_engine

GpSIMD Engine

dma_engine

DMA Engine

unknown_engine

Unknown Engine

Target#

engine

Neuron Device engines

nc_version

NeuronCore version

get_nc_version

Returns the nc_version of the current target context.

This document is relevant for: Inf2, Trn1, Trn2