nki.isa.tensor_scalar_cumulative#

nki.isa.tensor_scalar_cumulative(dst, src, op0, op1, imm0, imm1=None, reduce_cmd=reduce_cmd.reset_reduce)[source]#

Perform tensor-scalar arithmetic operation with cumulative reduction using Vector Engine.

The operation applies a scalar operation to each tensor element, then performs a cumulative reduction, storing the cumulative results in the destination tensor.

The operation can be expressed in pseudocode as:

if reduce_cmd == reset_reduce:
    if op1 == add or op1 == subtract:
        reg = 0
    elif op1 == mult:
        reg = 1
    elif op1 == max:
        reg = -inf
    elif op1 == min:
        reg = +inf
elif reduce_cmd == reduce:
    reg = reg
elif reduce_cmd == load_reduce:
    reg = imm1

for i in len(in_tensor):
    if not reverse0:
        reg = op1(op0(in_tensor[i], imm0), reg)
        out_tensor[i] = reg
    else:
        reg = op1(op0(imm0, in_tensor[i]), reg)
        out_tensor[i] = reg

Operation constraints:

Scalar operation (op0) must be an arithmetic op (e.g., add, mult, max)
Reduction operation (op1) is limited to add, subtract, mult, max, min
Input / output dtypes are restricted to BF16, FP16, FP32, FP8, UINT8, UINT16, INT8, INT16
- INT32/UINT32 are not supported as input/output dtypes (ISA limitation)

Accumulator behavior:

The Vector Engine maintains internal accumulator registers controlled via reduce_cmd:

reset_reduce: Reset accumulator based on reduction operation type
load_reduce: Initialize accumulator with imm1 value
reduce: Continue with existing accumulator value

Parameters:

dst – The destination tensor to write cumulative results to
src – The source tensor to process
op0 – Scalar arithmetic operation to apply to each element
op1 – Cumulative arithmetic operation for cumulative computation
imm0 – Scalar or vector value for tensor-scalar operation. Must be FP32 datatype
imm1 – (optional) Initial scalar or vector value for the accumulator when load_reduce is specified as the reduce_cmd. Must be FP32 datatype
reduce_cmd – (optional) Control accumulator behavior using nisa.reduce_cmd values, defaults to reset_reduce

nki.isa.tensor_scalar_cumulative

Contents

nki.isa.tensor_scalar_cumulative#