nki.isa.tensor_scalar_cumulative#

nki.isa.tensor_scalar_cumulative(dst, src, op0, op1, imm0, imm1=None, reduce_cmd=reduce_cmd.reset_reduce)[source]#

Perform tensor-scalar arithmetic operation with cumulative reduction using Vector Engine.

The operation applies a scalar operation to each tensor element, then performs a cumulative reduction, storing the cumulative results in the destination tensor.

The operation can be expressed in pseudocode as:


if reduce_cmd == reset_reduce:
if op1 == add or op1 == subtract:

reg = 0

elif op1 == mult:

reg = 1

elif op1 == max:

reg = -inf

elif op1 == min:

reg = +inf

elif reduce_cmd == reduce:

reg = reg

elif reduce_cmd == load_reduce:

reg = imm1

for i in len(in_tensor):
if not reverse0:

reg = op1(op0(in_tensor[i], imm0), reg) out_tensor[i] = reg

else:

reg = op1(op0(imm0, in_tensor[i]), reg) out_tensor[i] = reg

Operation constraints:

  • Scalar operation (op0) must be an arithmetic op (e.g., add, mult, max)

  • Reduction operation (op1) is limited to add, subtract, mult, max, min

  • Input / output dtypes are restricted to BF16, FP16, FP32, FP8, UINT8, UINT16, INT8, INT16
    • INT32/UINT32 are not supported as input/output dtypes (ISA limitation)

Accumulator behavior:

The Vector Engine maintains internal accumulator registers controlled via reduce_cmd:

  • reset_reduce: Reset accumulator based on reduction operation type

  • load_reduce: Initialize accumulator with imm1 value

  • reduce: Continue with existing accumulator value

Parameters:
  • dst – The destination tensor to write cumulative results to

  • src – The source tensor to process

  • op0 – Scalar arithmetic operation to apply to each element

  • op1 – Cumulative arithmetic operation for cumulative computation

  • imm0 – Scalar or vector value for tensor-scalar operation. Must be FP32 datatype

  • imm1 – (optional) Initial scalar or vector value for the accumulator when load_reduce is specified as the reduce_cmd. Must be FP32 datatype

  • reduce_cmd – (optional) Control accumulator behavior using nisa.reduce_cmd values, defaults to reset_reduce