NKI API Common Fields#
Supported Data Types#
Supported Data Types by NKI below lists all supported data types by NKI.
Almost all the NKI APIs accept a data type field, dtype, which can either be
a NumPy equivalent type or a nki.language data type.
Data Type |
Accepted |
|
|---|---|---|
Integer |
8-bit unsigned integer |
|
8-bit signed integer |
|
|
16-bit unsigned integer |
|
|
16-bit signed integer |
|
|
32-bit unsigned integer |
|
|
32-bit signed integer |
|
|
Float |
float8_e4m3 (1S,4E,3M) [2] |
|
float8_e5m2 (1S,5E,2M) |
|
|
float16 (1S,5E,10M) |
|
|
bfloat16 (1S,8E,7M) |
|
|
tfloat32 (1S,8E,10M) |
|
|
float32 (1S,8E,23M) |
|
|
Boolean |
boolean stored as uint8 |
|
Supported Math Operators for NKI ISA#
Supported Math Operators by NKI ISA below lists all the mathematical operator primitives supported by NKI.
Many nki.isa APIs (instructions) allow programmable operators through the op field.
The supported operators fall into two categories: bitvec and arithmetic. In general, instructions
using bitvec operators expect integer data types and treat input elements as bit patterns. On the other
hand, instructions using arithmetic operators accept any valid NKI data types and convert input elements
into float32 before performing the operators.
Operator |
|
Legal Reduction |
Supported Engine |
|
|---|---|---|---|---|
Bitvec |
Bitwise Not |
|
N |
Vector |
Bitwise And |
|
Y |
Vector |
|
Bitwise Or |
|
Y |
Vector |
|
Bitwise Xor |
|
Y |
Vector |
|
Arithmetic Shift Left |
|
N |
Vector |
|
Arithmetic Shift Right |
Not supported |
N |
Vector |
|
Logical Shift Left |
|
N |
Vector |
|
Logical Shift Right |
|
N |
Vector |
|
Arithmetic |
Add |
|
Y |
Vector/GpSIMD/Scalar |
Subtract |
|
Y |
Vector |
|
Multiply |
|
Y |
Vector/GpSIMD/Scalar |
|
Max |
|
Y |
Vector |
|
Min |
|
Y |
Vector |
|
Is Equal to |
|
N |
Vector |
|
Is Not Equal to |
|
N |
Vector |
|
Is Greater than or Equal to |
|
N |
Vector |
|
Is Greater than to |
|
N |
Vector |
|
Is Less than or Equal to |
|
N |
Vector |
|
Is Less than |
|
N |
Vector |
|
Logical Not |
|
N |
Vector |
|
Logical And |
|
Y |
Vector |
|
Logical Or |
|
Y |
Vector |
|
Logical Xor |
|
Y |
Vector |
|
Reverse Square Root |
|
N |
GpSIMD/Scalar |
|
Reciprocal |
|
N |
Vector/Scalar |
|
Absolute |
|
N |
Vector/Scalar |
|
Power |
|
N |
GpSIMD |
Note Add and Multiply are supported on Scalar Engine only from NeuronCore-v3. 32-bit integer Add and Multiply are only supported on GpSIMD Engine.
Supported Activation Functions for NKI ISA#
Supported Activation Functions by NKI ISA below lists all the activation function supported by the nki.isa.activation API. These
activation functions are approximated with piece-wise polynomials on Scalar Engine.
NOTE: if input values fall outside the supported Valid Input Range listed below,
the Scalar Engine will generate invalid output results.
Function Name |
Accepted |
Valid Input Range |
|---|---|---|
Identity |
|
|
Square |
|
|
Sigmoid |
|
|
Relu |
|
|
Gelu |
|
|
Gelu Derivative |
|
|
Gelu with Tanh Approximation |
|
|
Gelu with Sigmoid Approximation |
|
|
Silu |
|
|
Silu Derivative |
|
|
Tanh |
|
|
Softplus |
|
|
Mish |
|
|
Erf |
|
|
Erf Derivative |
|
|
Exponential |
|
|
Natural Log |
|
|
Sine |
|
|
Arctan |
|
|
Square Root |
|
|
Reverse Square Root |
|
|
Reciprocal |
|
|
Sign |
|
|
Absolute |
|
|
NKI Engine Selection for Operators Supported on Multiple Engines#
There is a tradeoff between precision and speed on different engines for operators with multiple engine options. Users can select which engine to map to based on their needs. We take reciprocal and reverse square root as two examples and explain the tradeoff below.
Reciprocal can run on Scalar Engine or Vector Engine:
Reciprocal can run on Vector Engine with
nki.isa.reciprocalor on Scalar Engine withnki.isa.activation(nl.reciprocal). Vector Engine performs reciprocal at a higher precision compared to Scalar Engine; however, the computation throughput of reciprocal on Vector Engine is about 8x lower than Scalar Engine for large input tiles. For input tiles with a small number of elements per partition (less than 64, processed one per cycle), instruction initiation interval (roughly 64 cycles) dominates performance so Scalar Engine and Vector Engine have comparable performance. In this case, we suggest using Vector Engine to achieve better precision.Estimated cycles on different engines:
Cost (Engine Cycles)
Condition
max(MIN_II, N)mapped to Scalar Engine
nki.isa.scalar_engine
max(MIN_II, 8*N)mapped to Vector Engine
nki.isa.vector_enginewhere,
Nis the number of elements per partition in the input tile.
MIN_IIis the minimum instruction initiation interval for small input tiles.MIN_IIis roughly 64 engine cycles.Note
nki.isa.activation(op=nl.reciprocal)doesn’t support setting bias on NeuronCore-v2.
Reverse square root can run on GpSIMD Engine or Scalar Engine:
Reverse square root can run on GpSIMD Engine with
nki.isa.tensor_scalar(op0=nl.rsqrt, operand0=0.0)or on Scalar Engine withnki.isa.activation(nl.rsqrt). GpSIMD Engine performs reverse square root at a higher precision compared to Scalar Engine; however, the computation throughput of reverse square root on GpSIMD Engine is 4x lower than Scalar Engine.
Footnotes