Neuron 2.25.0 is released! Check What's New and Announcements for more details.

nki.isa

Contents

This document is relevant for: Inf2, Trn1, Trn2

nki.isa#

NKI ISA#

`nc_matmul`	Compute `stationary.T @ moving` matrix multiplication using Tensor Engine.
`nc_transpose`	Perform a 2D transpose between the partition axis and the free axis of input `data`, i.e., a PF-transpose, using Tensor or Vector Engine.
`activation`	Apply an activation function on every element of the input tile using Scalar Engine.
`activation_reduce`	Perform the same computation as `nisa.activation` and also a reduction along the free dimension of the `nisa.activation` result using Scalar Engine.
`tensor_reduce`	Apply a reduction operation to the free axes of an input `data` tile using Vector Engine.
`tensor_partition_reduce`	Apply a reduction operation across partitions of an input `data` tile using GpSimd Engine.
`tensor_tensor`	Perform an element-wise operation of input two tiles using Vector Engine or GpSimd Engine.
`tensor_tensor_scan`	Perform a scan operation of two input tiles using Vector Engine.
`scalar_tensor_tensor`	Apply up to two math operators using Vector Engine: `(data <op0> operand0) <op1> operand1`.
`tensor_scalar`	Apply up to two math operators to the input `data` tile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar or GpSimd Engine: `(data <op0> operand0) <op1> operand1`.
`tensor_scalar_reduce`	Perform the same computation as `nisa.tensor_scalar` with one math operator and also a reduction along the free dimension of the `nisa.tensor_scalar` result using Vector Engine.
`tensor_copy`	Create a copy of `src` tile within NeuronCore on-chip SRAMs using Vector, Scalar or GpSimd Engine.
`tensor_copy_dynamic_src`	Create a copy of `src` tile within NeuronCore on-chip SRAMs using Vector or Scalar or GpSimd Engine, with `src` located at a dynamic offset within each partition.
`tensor_copy_dynamic_dst`	Create a copy of `src` tile within NeuronCore on-chip SRAMs using Vector or Scalar or GpSimd Engine, with `dst` located at a dynamic offset within each partition.
`tensor_copy_predicated`	Conditionally copy elements from the `src` tile to the destination tile on SBUF / PSUM based on a `predicate` using Vector Engine.
`reciprocal`	Compute reciprocal of each element in the input `data` tile using Vector Engine.
`iota`	Build a constant literal in SBUF using GpSimd Engine, rather than transferring the constant literal values from the host to device.
`dropout`	Randomly replace some elements of the input tile `data` with zeros based on input probabilities using Vector Engine.
`affine_select`	Select elements between an input tile `on_true_tile` and a scalar value `on_false_value` according to a boolean predicate tile using GpSimd Engine.
`range_select`	Select elements from `on_true_tile` based on comparison with bounds using Vector Engine.
`memset`	Initialize a tile filled with a compile-time constant value using Vector or GpSimd Engine.
`bn_stats`	Compute mean- and variance-related statistics for each partition of an input tile `data` in parallel using Vector Engine.
`bn_aggr`	Aggregate one or multiple `bn_stats` outputs to generate a mean and variance per partition using Vector Engine.
`local_gather`	Gather SBUF data in `src_buffer` using `index` on GpSimd Engine.
`dma_copy`	Copy data from `src` to `dst` using DMA engine.
`max8`	Find the 8 largest values in each partition of the source tile.
`nc_find_index8`	Find indices of the 8 given vals in each partition of the data tensor.
`nc_match_replace8`	Replace first occurrence of each value in `vals` with `imm` in `data` using the Vector engine and return the replaced tensor.
`nc_stream_shuffle`	Apply cross-partition data movement within a quadrant of 32 partitions from source tile `src` to destination tile `dst` using Vector Engine.

NKI ISA Config Enums#

`engine`	Neuron Device engines
`reduce_cmd`	Engine Register Reduce commands
`dge_mode`	Neuron Descriptor Generation Engine Mode

Target#

`nc_version`	NeuronCore version
`get_nc_version`	Returns the `nc_version` of the current target context.

This document is relevant for: Inf2, Trn1, Trn2