nki.isa.tensor_scalar#
- nki.isa.tensor_scalar(dst, data, op0, operand0, reverse0=False, op1=None, operand1=None, reverse1=False, engine=engine.unknown, name=None)[source]#
Apply up to two math operators to the input
datatile by broadcasting scalar/vector operands in the free dimension using Vector or Scalar or GpSimd Engine:(data <op0> operand0) <op1> operand1.The input
datatile can be an SBUF or PSUM tile. Bothoperand0andoperand1can be SBUF or PSUM tiles of shape(data.shape[0], 1), i.e., vectors, or compile-time constant scalars.op1andoperand1are optional, but must beNone(default values) when unused. Note, performing one operator has the same performance cost as performing two operators in the instruction.When the operators are non-commutative (e.g., subtract), we can reverse ordering of the inputs for each operator through:
reverse0 = True:tmp_res = operand0 <op0> datareverse1 = True:operand1 <op1> tmp_res
The
tensor_scalarinstruction supports two types of operators: 1) bitvec operators (e.g., bitwise_and) and 2) arithmetic operators (e.g., add). See Supported Math Operators for NKI ISA for the full list of supported operators. The two operators,op0andop1, in atensor_scalarinstruction must be of the same type (both bitvec or both arithmetic). If bitvec operators are used, thetensor_scalarinstruction must run on Vector Engine. Also, the input/output data types must be integer types, and input elements are treated as bit patterns without any data type casting.If arithmetic operators are used, the
tensor_scalarinstruction can run on Vector or Scalar or GpSimd Engine. However, each engine supports limited arithmetic operators (see :ref:tbl-aluop). The Scalar Engine on trn2 only supports a subset of the operator combination:op0=np.multiplyandop1=np.addop0=np.multiplyandop1=Noneop0=addandop1=None
Also, arithmetic operators impose no restriction on the input/output data types, but the engine automatically casts input data types to float32 and performs the operators in float32 math. The float32 computation results are cast to the target data type specified in the
dtypefield before written into the output tile, at no additional performance cost. If thedtypefield is not specified, it is default to be the same as input tile data type.- Parameters:
dst – an output tile of
(data <op0> operand0) <op1> operand1computationdata – the input tile
op0 – the first math operator used with operand0 (see Supported Math Operators for NKI ISA for supported operators)
operand0 – a scalar constant or a tile of shape
(data.shape[0], 1), where data.shape[0] is the partition axis size of the inputdatatilereverse0 – reverse ordering of inputs to
op0; if false,operand0is the rhs ofop0; if true,operand0is the lhs ofop0op1 – the second math operator used with operand1 (see Supported Math Operators for NKI ISA for supported operators); this operator is optional
operand1 – a scalar constant or a tile of shape
(data.shape[0], 1), where data.shape[0] is the partition axis size of the inputdatatilereverse1 – reverse ordering of inputs to
op1; if false,operand1is the rhs ofop1; if true,operand1is the lhs ofop1engine – (optional) the engine to use for the operation: nki.isa.vector_engine, nki.isa.scalar_engine, nki.isa.gpsimd_engine (only allowed for rsqrt) or nki.isa.unknown_engine (default, let compiler select best engine based on the input tile shape).