This document is relevant for: Inf2
, Trn1
, Trn2
nki.language#
Memory operations#
Load a tensor from device memory (HBM) into on-chip memory (SBUF). |
|
Store into a tensor on device memory (HBM) from on-chip memory (SBUF). |
|
Load a tensor from device memory (HBM) and 2D-transpose the data before storing into on-chip memory (SBUF). |
|
Perform an atomic read-modify-write operation on HBM data |
|
Create a copy of the src tile. |
Creation operations#
Create a new tensor of given shape and dtype on the specified buffer. |
|
Create a new tensor of given shape and dtype on the specified buffer, filled with zeros. |
|
Create a new tensor of zeros with the same shape and type as a given tensor. |
|
Create a new tensor of given shape and dtype on the specified buffer, filled with ones. |
|
Create a new tensor of given shape and dtype on the specified buffer, filled with initial value. |
|
Generate a tile of given shape and dtype, filled with random values that are sampled from a uniform distribution between 0 and 1. |
|
Sets a seed, specified by user, to the random number generator on HW. |
|
Create a new tensor filled with the data specified by data array. |
|
Create a new identity tensor with specified data type. |
Math operations#
Add the inputs, element-wise. |
|
Subtract the inputs, element-wise. |
|
Multiply the inputs, element-wise. |
|
Divide the inputs, element-wise. |
|
Elements of x raised to powers of y, element-wise. |
|
Maximum of the inputs, element-wise. |
|
Minimum of the inputs, element-wise. |
|
Maximum of elements along the specified axis (or axes) of the input. |
|
Minimum of elements along the specified axis (or axes) of the input. |
|
Arithmetic mean along the specified axis (or axes) of the input. |
|
Variance along the specified axis (or axes) of the input. |
|
Sum of elements along the specified axis (or axes) of the input. |
|
Product of elements along the specified axis (or axes) of the input. |
|
Whether all elements along the specified axis (or axes) evaluate to True. |
|
Absolute value of the input, element-wise. |
|
Numerical negative of the input, element-wise. |
|
Sign of the numbers of the input, element-wise. |
|
Truncated value of the input, element-wise. |
|
Floor of the input, element-wise. |
|
Ceiling of the input, element-wise. |
|
Exponential of the input, element-wise. |
|
Natural logarithm of the input, element-wise. |
|
Cosine of the input, element-wise. |
|
Sine of the input, element-wise. |
|
Tangent of the input, element-wise. |
|
Hyperbolic tangent of the input, element-wise. |
|
Inverse tangent of the input, element-wise. |
|
Non-negative square-root of the input, element-wise. |
|
Reciprocal of the square-root of the input, element-wise. |
|
Logistic sigmoid activation function on the input, element-wise. |
|
Rectified Linear Unit activation function on the input, element-wise. |
|
Gaussian Error Linear Unit activation function on the input, element-wise. |
|
Derivative of Gaussian Error Linear Unit (gelu) on the input, element-wise. |
|
Gaussian Error Linear Unit activation function on the input, element-wise, with tanh approximation. |
|
Sigmoid Linear Unit activation function on the input, element-wise. |
|
Derivative of Sigmoid Linear Unit activation function on the input, element-wise. |
|
Error function of the input, element-wise. |
|
Derivative of the Error function (erf) on the input, element-wise. |
|
Softplus activation function on the input, element-wise. |
|
Mish activation function on the input, element-wise. |
|
Square of the input, element-wise. |
|
Softmax activation function on the input, element-wise. |
|
Apply Root Mean Square Layer Normalization. |
|
Randomly zeroes some of the elements of the input tile given a probability rate. |
|
|
|
Transposes a 2D tile between its partition and free dimension. |
Bitwise operations#
Bitwise AND of the two inputs, element-wise. |
|
Bitwise OR of the two inputs, element-wise. |
|
Bitwise XOR of the two inputs, element-wise. |
|
Bitwise NOT of the input, element-wise. |
|
Bitwise left-shift x by y, element-wise. |
|
Bitwise right-shift x by y, element-wise. |
Logical operations#
Element-wise boolean result of x == y. |
|
Element-wise boolean result of x != y. |
|
Element-wise boolean result of x > y. |
|
Element-wise boolean result of x >= y. |
|
Element-wise boolean result of x < y. |
|
Element-wise boolean result of x <= y. |
|
Element-wise boolean result of x AND y. |
|
Element-wise boolean result of x OR y. |
|
Element-wise boolean result of x XOR y. |
|
Element-wise boolean result of NOT x. |
Tensor manipulation operations#
Construct a dynamic slice for simple tensor indexing. |
|
Return contiguous values within a given interval, used for indexing a tensor to define a tile. |
|
Same as NumPy mgrid: "An instance which returns a dense (or fleshed out) mesh-grid when indexed, so that each returned argument has the same shape. |
|
Expand the shape of a tile. |
Sorting/Searching operations#
Return elements chosen from x or y depending on condition. |
Collective communication operations#
Apply reduce operation over multiple SPMD programs. |
Iterators#
Create a sequence of numbers for use as loop iterators in NKI, resulting in a fully unrolled loop. |
|
Create a sequence of numbers for use as parallel loop iterators in NKI. |
|
Create a sequence of numbers for use as sequential loop iterators in NKI. |
Memory Hierarchy#
Mark a dimension explicitly as a partition dimension. |
|
PSUM - Only visible to each individual kernel instance in the SPMD grid, alias of |
|
State Buffer - Only visible to each individual kernel instance in the SPMD grid, alias of |
|
HBM - Alias of private_hbm |
|
HBM - Only visible to each individual kernel instance in the SPMD grid |
|
Shared HBM - Visible to all kernel instances in the SPMD grid |
Others#
Index of the current SPMD program along the given axis in the launch grid. |
|
Number of SPMD programs along the given axes in the launch grid. |
|
Number of dimensions in the SPMD launch grid. |
|
Create a dimension in the SPMD launch grid of a NKI kernel with sub-dimension tiling. |
|
Create a logical neuron core dimension in launch grid. |
|
Print a message with a String |
|
Apply reduce operation over a loop. |
Data Types#
32-bit floating-point number (1S,8E,10M) |
|
16-bit floating-point number (1S,8E,7M) |
|
8-bit floating-point number (1S,4E,3M) |
|
8-bit floating-point number (1S,5E,2M) |
Constants#
Tile size constants. |
This document is relevant for: Inf2
, Trn1
, Trn2