This document is relevant for: Inf2
, Trn1
, Trn1n
nki.isa.tensor_scalar#
- nki.isa.tensor_scalar(data, op0, operand0, reverse0=False, op1=None, operand1=None, reverse1=False, dtype=None, mask=None, **kwargs)[source]#
Apply up to two math operators to the input
data
tile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar Engine:(data <op0> operand0) <op1> operand1
.The input
data
tile can be an SBUF or PSUM tile. Bothoperand0
andoperand1
can be SBUF or PSUM tiles of shape(data.shape[0], 1)
, i.e., vectors, or compile-time constant scalars.op1
andoperand1
are optional, but must beNone
(default values) when unused. Note, performing one operator has the same performance cost as performing two operators in the instruction.When the operators are non-commutative (e.g., subtract), we can reverse ordering of the inputs for each operator through:
reverse0 = True
:tmp_res = operand0 <op0> data
reverse1 = True
:operand1 <op1> tmp_res
The
tensor_scalar
instruction supports two types of operators: 1) bitvec operators (e.g., bitwise_and) and 2) arithmetic operators (e.g., add). See Supported Math Operators for the full list of supported operators. The two operators,op0
andop1
, in atensor_scalar
instruction must be of the same type (both bitvec or both arithmetic). If bitvec operators are used, thetensor_scalar
instruction must run on Vector Engine. Also, the input/output data types must be integer types, and input elements are treated as bit patterns without any data type casting.If arithmetic operators are used, the
tensor_scalar
instruction can run on Vector or Scalar Engine. However, the Scalar Engine only supports a subset of the operator combination:op0=np.multiply
andop1=np.add
op0=np.multiply
andop1=None
op0=add
andop1=None
Currently, the compiler instruction scheduler selects the engine automatically based on the operator types.
Also, arithmetic operators impose no restriction on the input/output data types, but the engine automatically casts input data types to float32 and performs the operators in float32 math. The float32 computation results are cast to the target data type specified in the
dtype
field before written into the output tile, at no additional performance cost. If thedtype
field is not specified, it is default to be the same as input tile data type.Estimated instruction cost:
N
Vector or Scalar Engine cycles depending which engine compiler assigns the instruction to, whereN
is the number of elements per partition indata
.- Parameters:
data – the input tile
op0 – the first math operator used with operand0 (see Supported Math Operators for supported operators)
operand0 – a scalar constant or a tile of shape
(data.shape[0], 1)
, where data.shape[0] is the partition axis size of the inputdata
tilereverse0 – reverse ordering of inputs to
op0
; if false,operand0
is the rhs ofop0
; if true,operand0
is the lhs ofop0
op1 – the second math operator used with operand1 (see Supported Math Operators for supported operators); this operator is optional
operand1 – a scalar constant or a tile of shape
(data.shape[0], 1)
, where data.shape[0] is the partition axis size of the inputdata
tilereverse1 – reverse ordering of inputs to
op1
; if false,operand1
is the rhs ofop1
; if true,operand1
is the lhs ofop1
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
- Returns:
an output tile of
(data <op0> operand0) <op1> operand1
computation
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl import numpy as np ... ################################################################## # Example 1: subtract 1.0 from all elements of tile a of # shape (128, 512) and get the output tile in b ################################################################## i_p = nl.arange(128)[:, None] i_f = nl.arange(512)[None, :] b = nisa.tensor_scalar(a[i_p, i_f], np.subtract, 1.0) ################################################################## # Example 2: broadcast 1.0 into a shape of (128, 512) and subtract # it with tile c to get output tile d ################################################################## i_p = nl.arange(128)[:, None] i_f = nl.arange(512)[None, :] d = nisa.tensor_scalar(c[i_p, i_f], np.subtract, 1.0, reverse0=True) ################################################################## # Example 3: broadcast multiply tile e with vector f and # then broadcast add with scalar 2.5; # tile e has a shape of (64, 1024) and vector f has a shape of (64, 1) ################################################################## i_p_ef = nl.arange(64)[:, None] i_f_e = nl.arange(1024)[None, :] i_f_f = nl.arange(1)[None, :] g = nisa.tensor_scalar(e[i_p_ef, i_f_e], op0=np.multiply, operand0=f[i_p_ef, i_f_f], op1=np.add, operand1=2.5)
This document is relevant for: Inf2
, Trn1
, Trn1n