nki.isa#
NKI ISA#
Compute |
|
Compute matrix multiplication of MXFP8/MXFP4 quantized matrices with integrated dequantization using Tensor Engine. |
|
Perform a 2D transpose between the partition axis and the free axis of input |
|
Apply an activation function on every element of the input tile using Scalar Engine, with an optional scale/bias operation before the activation and an optional reduction operation after the activation in the same instruction. |
|
Perform the same computation as |
|
Apply a reduction operation to the free axes of an input |
|
Apply a reduction operation across partitions of an input |
|
Perform an element-wise operation of input two tiles using Vector Engine or GpSimd Engine. |
|
Perform a scan operation of two input tiles using Vector Engine. |
|
Apply two math operators in sequence using Vector Engine: |
|
Apply up to two math operators to the input |
|
Perform the same computation as |
|
Create a copy of |
|
Create a copy of |
|
Create a copy of |
|
Conditionally copy elements from the |
|
Compute element-wise reciprocal (1.0/x) of the input |
|
Quantize FP16/BF16 data to MXFP8 tensors (both data and scales) using Vector Engine. |
|
Generate a constant literal pattern into SBUF using GpSimd Engine. |
|
Randomly replace some elements of the input tile |
|
Select elements between an input tile |
|
Select elements from |
|
Selectively copy elements from either |
|
Compute the sequence bounds for a given set of segment IDs using GpSIMD Engine. |
|
Initialize a tile filled with a compile-time constant value using Vector or GpSimd Engine. |
|
Compute mean- and variance-related statistics for each partition of an input tile |
|
Aggregate one or multiple |
|
Gather SBUF data in |
|
Copy data from |
|
Perform a transpose on input |
|
Perform math operations using compute logic inside DMA engines with element-wise scaling and reduction. |
|
Find the 8 largest values in each partition of the source tile. |
|
Find indices of the 8 given vals in each partition of the data tensor. |
|
Replace first occurrence of each value in |
|
Apply cross-partition data movement within a quadrant of 32 partitions from source tile |
|
Allocate a virtual register and optionally initialize it with an integer value |
|
Load a scalar value from memory (HBM or SBUF) into a virtual register. |
|
Move a compile-time constant integer value into a virtual register. |
|
Store the value from a virtual register into memory (HBM/SBUF). |
|
Synchronize execution across multiple NeuronCores by implementing a barrier mechanism. |
|
Perform point-to-point communication between NeuronCores by sending and receiving data simultaneously using DMA engines. |
NKI ISA Config Enums#
Neuron Device engines |
|
Engine Register Reduce commands |
|
Neuron Descriptor Generation Engine Mode |
Target#
NeuronCore version |
|
Returns the |