nki.isa.scalar_tensor_tensor#
- nki.isa.scalar_tensor_tensor(dst, data, op0, operand0, op1, operand1, reverse0=False, reverse1=False, name=None)[source]#
Apply two math operators in sequence using Vector Engine:
(data <op0> operand0) <op1> operand1.This instruction is equivalent to running two operations back-to-back: 1.
temp_result = tensor_scalar(data, op0, operand0)- broadcastoperand0and applyop02.dst = tensor_tensor(temp_result, op1, operand1)- element-wise operation withoperand1The
operand0can be either a compile-time constant scalar for broadcast across all elements ofdataor a tile of shape(data.shape[0], 1)for broadcast along the free dimension. Theoperand1tile must have the same shape asdatafor element-wise operation.The scalar broadcasting in the first operation is performed at no additional performance cost, making this instruction have approximately the same latency as a regular
tensor_tensorinstruction.Both
op0andop1must be arithmetic operators (see Supported Math Operators for NKI ISA for supported operators). Bitvec operators are not supported. When the operators are non-commutative (e.g., subtract), operand ordering can be reversed usingreverse0andreverse1flags.Memory types.
The input
datatile can be an SBUF or PSUM tile. Theoperand0can be an SBUF or PSUM tile or a compile-time constant scalar. Theoperand1must be an SBUF or PSUM tile. However,dataandoperand1cannot both reside in PSUM. The outputdsttile can be written to either SBUF or PSUM.Data types.
All input tiles can be any supported NKI data type (see Supported Data Types for more information). The Vector Engine automatically casts input data types to float32 and performs all computations in float32 math. The float32 results are cast to the data type of output
dst.Layout.
The parallel computation dimension of
nisa.scalar_tensor_tensoris along the partition dimension.Tile size.
The partition dimension size of input
data,operand1, and outputdsttiles must be the same and must not exceed 128. The total number of elements per partition of inputdata,operand1, and outputdsttiles must be the same and must not exceed the physical size of each SBUF partition. If operand0 is not a scalar, the partition dimension size ofoperand0must be the same as that ofdataand the number of elements per partition ofoperand0must be 1.- Parameters:
dst – the output tile
data – the input tile
op0 – the first math operator used with operand0 (see Supported Math Operators for NKI ISA for supported operators)
operand0 – a scalar constant or a tile of shape
(data.shape[0], 1), where data.shape[0] is the partition axis size of the inputdatatilereverse0 – reverse ordering of inputs to
op0; if false,operand0is the rhs ofop0; if true,operand0is the lhs ofop0op1 – the second math operator used with operand1 (see Supported Math Operators for NKI ISA for supported operators)
operand1 – a tile with the same size as
datafor element-wise operationreverse1 – reverse ordering of inputs to
op1; if false,operand1is the rhs ofop1; if true,operand1is the lhs ofop1