nki.isa.tensor_scalar_cumulative#
- nki.isa.tensor_scalar_cumulative(dst, src, op0, op1, imm0, imm1=None, reduce_cmd=reduce_cmd.reset_reduce)[source]#
Perform tensor-scalar arithmetic operation with cumulative reduction using Vector Engine.
The operation applies a scalar operation to each tensor element, then performs a cumulative reduction, storing the cumulative results in the destination tensor.
The operation can be expressed in pseudocode as:
- if reduce_cmd == reset_reduce:
- if op1 == add or op1 == subtract:
reg = 0
- elif op1 == mult:
reg = 1
- elif op1 == max:
reg = -inf
- elif op1 == min:
reg = +inf
- elif reduce_cmd == reduce:
reg = reg
- elif reduce_cmd == load_reduce:
reg = imm1
- for i in len(in_tensor):
- if not reverse0:
reg = op1(op0(in_tensor[i], imm0), reg) out_tensor[i] = reg
- else:
reg = op1(op0(imm0, in_tensor[i]), reg) out_tensor[i] = reg
Operation constraints:
Scalar operation (
op0) must be an arithmetic op (e.g., add, mult, max)Reduction operation (
op1) is limited to add, subtract, mult, max, min- Input / output dtypes are restricted to BF16, FP16, FP32, FP8, UINT8, UINT16, INT8, INT16
INT32/UINT32 are not supported as input/output dtypes (ISA limitation)
Accumulator behavior:
The Vector Engine maintains internal accumulator registers controlled via
reduce_cmd:reset_reduce: Reset accumulator based on reduction operation typeload_reduce: Initialize accumulator withimm1valuereduce: Continue with existing accumulator value
- Parameters:
dst – The destination tensor to write cumulative results to
src – The source tensor to process
op0 – Scalar arithmetic operation to apply to each element
op1 – Cumulative arithmetic operation for cumulative computation
imm0 – Scalar or vector value for tensor-scalar operation. Must be FP32 datatype
imm1 – (optional) Initial scalar or vector value for the accumulator when
load_reduceis specified as thereduce_cmd. Must be FP32 datatypereduce_cmd – (optional) Control accumulator behavior using
nisa.reduce_cmdvalues, defaults toreset_reduce