nki.isa.scalar_tensor_tensor#

nki.isa.scalar_tensor_tensor(dst, data, op0, operand0, op1, operand1, reverse0=False, reverse1=False, name=None)[source]#

Apply two math operators in sequence using Vector Engine: (data <op0> operand0) <op1> operand1.

This instruction is equivalent to running two operations back-to-back: 1. temp_result = tensor_scalar(data, op0, operand0) - broadcast operand0 and apply op0 2. dst = tensor_tensor(temp_result, op1, operand1) - element-wise operation with operand1

The operand0 can be either a compile-time constant scalar for broadcast across all elements of data or a tile of shape (data.shape[0], 1) for broadcast along the free dimension. The operand1 tile must have the same shape as data for element-wise operation.

The scalar broadcasting in the first operation is performed at no additional performance cost, making this instruction have approximately the same latency as a regular tensor_tensor instruction.

Both op0 and op1 must be arithmetic operators (see Supported Math Operators for NKI ISA for supported operators). Bitvec operators are not supported. When the operators are non-commutative (e.g., subtract), operand ordering can be reversed using reverse0 and reverse1 flags.

Memory types.

The input data tile can be an SBUF or PSUM tile. The operand0 can be an SBUF or PSUM tile or a compile-time constant scalar. The operand1 must be an SBUF or PSUM tile. However, data and operand1 cannot both reside in PSUM. The output dst tile can be written to either SBUF or PSUM.

Data types.

All input tiles can be any supported NKI data type (see Supported Data Types for more information). The Vector Engine automatically casts input data types to float32 and performs all computations in float32 math. The float32 results are cast to the data type of output dst.

Layout.

The parallel computation dimension of nisa.scalar_tensor_tensor is along the partition dimension.

Tile size.

The partition dimension size of input data, operand1, and output dst tiles must be the same and must not exceed 128. The total number of elements per partition of input data, operand1, and output dst tiles must be the same and must not exceed the physical size of each SBUF partition. If operand0 is not a scalar, the partition dimension size of operand0 must be the same as that of data and the number of elements per partition of operand0 must be 1.

Parameters:
  • dst – the output tile

  • data – the input tile

  • op0 – the first math operator used with operand0 (see Supported Math Operators for NKI ISA for supported operators)

  • operand0 – a scalar constant or a tile of shape (data.shape[0], 1), where data.shape[0] is the partition axis size of the input data tile

  • reverse0 – reverse ordering of inputs to op0; if false, operand0 is the rhs of op0; if true, operand0 is the lhs of op0

  • op1 – the second math operator used with operand1 (see Supported Math Operators for NKI ISA for supported operators)

  • operand1 – a tile with the same size as data for element-wise operation

  • reverse1 – reverse ordering of inputs to op1; if false, operand1 is the rhs of op1; if true, operand1 is the lhs of op1