This document is relevant for: Inf2
, Trn1
, Trn1n
nki.isa.activation#
- nki.isa.activation(op, data, bias=None, scale=1.0, mask=None, dtype=None, **kwargs)[source]#
Apply an activation function on every element of the input tile using Scalar Engine. The activation function is specified in the
op
input field (see Supported Activation Functions for a list of supported activation functions).The activation instruction can optionally multiply the input
data
by a scalar or vectorscale
and then add another vectorbias
before the activation function is applied, at no additional performance cost:\[output = f_{act}(data * scale + bias)\]When the scale is a scalar, it must be a compile-time constant. In this case, the scale is broadcasted to all the elements in the input
data
tile. When the scale/bias is a vector, it must have the same partition axis size as the inputdata
tile and only one element per partition. In this case, the element of scale/bias within each partition is broadcasted to elements of the inputdata
tile in the same partition.Note, the Scalar Engine always performs the math operations in float32 precision. Therefore, the engine automatically casts the input
data
tile to float32 before performing multiply/add/activate specified in the activation instruction. The engine is also capable of casting the float32 math results into another output data type specified by thedtype
field at no additional performance cost. Ifdtype
field is not specified, Neuron Compiler will set output data type of the instruction to be the same as input data type ofdata
. On the other hand, thescale
parameter must have a float32 data type, while thebias
parameter can be float32/float16/bfloat16.The input
data
tile can be an SBUF or PSUM tile. Similarly, the instruction can write the output tile into either SBUF or PSUM, which is specified using thebuffer
field. If not specified,nki.language.sbuf
is selected by default.Estimated instruction cost:
N
Scalar Engine cycles, whereN
is the number of elements per partition indata
.- Parameters:
op – an activation function (see Supported Activation Functions for supported functions)
data – the input tile; layout: (partition axis <= 128, free axis)
bias – a vector with the same partition axis size as
data
for broadcast add (after broadcast multiply withscale
)scale – a scalar or a vector with the same partition axis size as
data
for broadcast multiplydtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
- Returns:
output tile of the activation instruction; layout: same as input
data
tile
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl import numpy as np ... ################################################################## # Example 1: perform exponential function on matrix a of shape (128, 1024) ################################################################## i_p_a = nl.arange(128)[:, None] i_f_a = nl.arange(1024)[None, :] activated_a = nisa.activation(op=nl.exp, data=a[i_p_a, i_f_a]) ################################################################## # Example 2: perform the following operations to matrix b of shape (128, 512) # using a single activation instruction: # 1) multiply all elements in b by 2.0, # 2) add a user-input vector c to the 1) result, # 3) apply a square function on the 2) result # 4) cast 3) results into bfloat16 ################################################################## i_p_b = i_p_c = nl.arange(128)[:, None] i_f_b = nl.arange(512)[None, :] i_f_c = nl.arange(1)[None, :] activated_b = nisa.activation(op=np.square, data=b[i_p_b, i_f_b], bias=c[i_p_c, i_f_c], scale=2.0, dtype=nl.bfloat16)
This document is relevant for: Inf2
, Trn1
, Trn1n