This document is relevant for: Inf2
, Trn1
, Trn2
Memory operations#
Load a tensor from device memory (HBM) into on-chip memory (SBUF). |
Store into a tensor on device memory (HBM) from on-chip memory (SBUF). |
Load a tensor from device memory (HBM) and 2D-transpose the data before storing into on-chip memory (SBUF). |
Perform an atomic read-modify-write operation on HBM data |
Create a copy of the src tile. |
Creation operations#
Create a new tensor of given shape and dtype on the specified buffer. |
Create a new tensor of given shape and dtype on the specified buffer, filled with zeros. |
Create a new tensor of zeros with the same shape and type as a given tensor. |
Create a new tensor of given shape and dtype on the specified buffer, filled with ones. |
Create a new tensor of given shape and dtype on the specified buffer, filled with initial value. |
Generate a tile of given shape and dtype, filled with random values that are sampled from a uniform distribution between 0 and 1. |
Sets a seed, specified by user, to the random number generator on HW. |
Create a new tensor filled with the data specified by data array. |
Create a new identity tensor with specified data type. |
Math operations#
Add the inputs, element-wise. |
Subtract the inputs, element-wise. |
Multiply the inputs, element-wise. |
Divide the inputs, element-wise. |
Elements of x raised to powers of y, element-wise. |
Maximum of the inputs, element-wise. |
Minimum of the inputs, element-wise. |
Maximum of elements along the specified axis (or axes) of the input. |
Minimum of elements along the specified axis (or axes) of the input. |
Arithmetic mean along the specified axis (or axes) of the input. |
Variance along the specified axis (or axes) of the input. |
Sum of elements along the specified axis (or axes) of the input. |
Product of elements along the specified axis (or axes) of the input. |
Whether all elements along the specified axis (or axes) evaluate to True. |
Absolute value of the input, element-wise. |
Numerical negative of the input, element-wise. |
Sign of the numbers of the input, element-wise. |
Truncated value of the input, element-wise. |
Floor of the input, element-wise. |
Ceiling of the input, element-wise. |
Exponential of the input, element-wise. |
Natural logarithm of the input, element-wise. |
Cosine of the input, element-wise. |
Sine of the input, element-wise. |
Tangent of the input, element-wise. |
Hyperbolic tangent of the input, element-wise. |
Inverse tangent of the input, element-wise. |
Non-negative square-root of the input, element-wise. |
Reciprocal of the square-root of the input, element-wise. |
Logistic sigmoid activation function on the input, element-wise. |
Rectified Linear Unit activation function on the input, element-wise. |
Gaussian Error Linear Unit activation function on the input, element-wise. |
Derivative of Gaussian Error Linear Unit (gelu) on the input, element-wise. |
Gaussian Error Linear Unit activation function on the input, element-wise, with tanh approximation. |
Sigmoid Linear Unit activation function on the input, element-wise. |
Derivative of Sigmoid Linear Unit activation function on the input, element-wise. |
Error function of the input, element-wise. |
Derivative of the Error function (erf) on the input, element-wise. |
Softplus activation function on the input, element-wise. |
Mish activation function on the input, element-wise. |
Square of the input, element-wise. |
Softmax activation function on the input, element-wise. |
Apply Root Mean Square Layer Normalization. |
Randomly zeroes some of the elements of the input tile given a probability rate. |
Transposes a 2D tile between its partition and free dimension. |
Bitwise operations#
Bitwise AND of the two inputs, element-wise. |
Bitwise OR of the two inputs, element-wise. |
Bitwise XOR of the two inputs, element-wise. |
Bitwise NOT of the input, element-wise. |
Bitwise left-shift x by y, element-wise. |
Bitwise right-shift x by y, element-wise. |
Logical operations#
Element-wise boolean result of x == y. |
Element-wise boolean result of x != y. |
Element-wise boolean result of x > y. |
Element-wise boolean result of x >= y. |
Element-wise boolean result of x < y. |
Element-wise boolean result of x <= y. |
Element-wise boolean result of x AND y. |
Element-wise boolean result of x OR y. |
Element-wise boolean result of x XOR y. |
Element-wise boolean result of NOT x. |
Tensor manipulation operations#
Construct a dynamic slice for simple tensor indexing. |
Return contiguous values within a given interval, used for indexing a tensor to define a tile. |
Same as NumPy mgrid: "An instance which returns a dense (or fleshed out) mesh-grid when indexed, so that each returned argument has the same shape. |
Expand the shape of a tile. |
Sorting/Searching operations#
Return elements chosen from x or y depending on condition. |
Collective communication operations#
Apply reduce operation over multiple SPMD programs. |
Create a sequence of numbers for use as loop iterators in NKI, resulting in a fully unrolled loop. |
Create a sequence of numbers for use as parallel loop iterators in NKI. |
Create a sequence of numbers for use as sequential loop iterators in NKI. |
Memory Hierarchy#
Mark a dimension explicitly as a partition dimension. |
PSUM - Only visible to each individual kernel instance in the SPMD grid, alias of |
State Buffer - Only visible to each individual kernel instance in the SPMD grid, alias of |
HBM - Alias of private_hbm |
HBM - Only visible to each individual kernel instance in the SPMD grid |
Shared HBM - Visible to all kernel instances in the SPMD grid |
Index of the current SPMD program along the given axis in the launch grid. |
Number of SPMD programs along the given axes in the launch grid. |
Number of dimensions in the SPMD launch grid. |
Create a dimension in the SPMD launch grid of a NKI kernel with sub-dimension tiling. |
Create a logical neuron core dimension in launch grid. |
Print a message with a String |
Apply reduce operation over a loop. |
Data Types#
32-bit floating-point number (1S,8E,10M) |
16-bit floating-point number (1S,8E,7M) |
8-bit floating-point number (1S,4E,3M) |
8-bit floating-point number (1S,5E,2M) |
Tile size constants. |
This document is relevant for: Inf2
, Trn1
, Trn2