nki.isa.activation#

nki.isa.activation(dst, op, data, bias=None, scale=1.0, reduce_op=None, reduce_res=None, reduce_cmd=reduce_cmd.idle, name=None)[source]#

Apply an activation function on every element of the input tile using Scalar Engine, with an optional scale/bias operation before the activation and an optional reduction operation after the activation in the same instruction.

The activation function is specified in the op input field (see Supported Activation Functions for NKI ISA for a list of supported activation functions and their valid input ranges).

nisa.activation can optionally multiply the input data by a scalar or vector scale and then add another vector bias before the activation function is applied.

After the activation function is applied, Scalar Engine can also reduce along the free dimensions of the activated data per lane, using reduce_op operation. reduce_op must be nl.add.

The reduction result is then either stored into or reduced on top of a set of internal engine registers called reduce_regs (one 32-bit register per compute lane, 128 registers in total), controlled by the reduce_cmd field:

  • nisa.reduce_cmd.reset: Reset reduce_regs to zero only.

  • nisa.reduce_cmd.idle: Do not modify reduce_regs.

  • nisa.reduce_cmd.reduce: Reduce activated data over existing values in reduce_regs.

  • nisa.reduce_cmd.reset_reduce: Reset reduce_regs to zero and then store the reduction result of the activated data.

nisa.activation can also emit another instruction to read out reduce_regs by passing an SBUF/PSUM tile in the reduce_res arguments. The reduce_regs state can persist across multiple nisa.activation instructions without the need to be evicted back to SBUF/PSUM (reduce_res tile).

The following is the pseudo code for nisa.activation:

\[ \begin{align}\begin{aligned}output = op(data * scale + bias)\\if reduce_cmd == nisa.reduce_cmd.reset or reduce_cmd == nisa.reduce_cmd.reset_reduce: reduce_regs = 0\\result = reduce\_op(reduce_regs, reduce\_op(output, axis=<FreeAxis>))\\if reduce_cmd == nisa.reduce_cmd.reduce or reduce_cmd == nisa.reduce_cmd.reset_reduce: reduce_regs += result\\if reduce_res: reduce_res = reduce_regs\end{aligned}\end{align} \]

All these optional operations incur no further performance penalty compared to only applying the activation function, except reading out reduce_regs into reduce_res will have a small overhead due to an extra instruction.

Memory types.

The input data tile can be an SBUF or PSUM tile. Similarly, the instruction can write the output dst tile into either SBUF or PSUM.

Data types.

Both input data and output dst tiles can be in any valid NKI data type (see Supported Data Types for more information). The Scalar Engine always performs the math operations in float32 precision. Therefore, the engine automatically casts the input data tile to float32 before performing multiply/add/activate specified in the activation instruction. The engine is also capable of casting the float32 math results into another output data type in dst at no additional performance cost. The scale parameter must have a float32 data type, while the bias parameter can be float32/float16/bfloat16.

Layout.

The scale can either be a compile-time constant scalar or a [N, 1] vector from SBUF/PSUM. N must be the same as the partition dimension size of data. In NeuronCore-v2, the bias must be a [N, 1] vector, but starting NeuronCore-v3, bias can either be a compile-time constant scalar or a [N, 1] vector similar to scale.

When the scale (or similarly, bias) is a scalar, the scalar is broadcasted to all the elements in the input data tile to perform the computation. When the scale (or bias) is a vector, the scale (or bias) value in each partition is broadcast along the free dimension of the data tile.

Tile size.

The partition dimension size of input data and output dst tiles must be the same and must not exceed 128. The number of elements per partition of data and dst tiles must be the same and must not exceed the physical size of each SBUF partition.

Parameters:
  • dst – the activation output

  • op – an activation function (see Supported Activation Functions for NKI ISA for supported functions)

  • data – the input tile; layout: (partition axis <= 128, free axis)

  • scale – a scalar or a vector for multiplication

  • bias – a scalar (NeuronCore-v3 or newer) or a vector for addition

  • reduce_op – the reduce operation to perform on the free dimension of the activated data

  • reduce_res – a tile of shape (data.shape[0], 1) to hold the final state of reduce_regs.

  • reduce_cmd – an enum member from nisa.reduce_cmd to control the state of reduce_regs.