nki.isa.exponential#

nki.isa.exponential(dst, src, max_value=0.0, reduce_res=None, reduce_cmd=reduce_cmd.idle, reduce_init=0.0)[source]#

Apply exponential function to each element after subtracting a max_value using Vector Engine.

Note

Available only on NeuronCore-v4 and newer.

This instruction computes exp(src - max_value) for each element. The instruction can optionally maintain a running sum of the exponential values using shared internal reduction registers in the Vector Engine.

The exponential operation is performed as:

dst[i] = exp(src[i] - max_value)

When accumulation is enabled through reduce_cmd, the instruction also computes:

reduce_res[i] = sum(dst[i])

The Vector Engine performs the computation in float32 precision internally and can output results in various data types as specified by the dst dtype field.

Constraints

Supported engines: Vector.
src, dst must have the same number of elements in the partition dimension.
src, dst must have the same number of elements in the free dimensions.
src, dst can be up to 4D tensor.
reduce_init should be unset or set to 0.0 when reduce_cmd is not load_reduce.

Parameters:

dst – The output tile with exponential function applied. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32, int8, int16, int32, uint8, uint16.
src – The input tile to apply exponential function on. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, int8, int16, int32, uint8, uint16, uint32.
max_value – The maximum value to subtract from each element before applying exponential (for numerical stability). Can be a scalar or vector of shape (src.shape[0], 1). Supported dtypes: float32.
reduce_res – Optional tile to store reduction results (sum of exponentials). Must have shape (src.shape[0], 1). Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32.
reduce_cmd – Control the state of reduction registers for accumulating exponential results. Supported: idle, reset_reduce, reduce, load_reduce.
reduce_init – Initial value for reduction when using reduce_cmd.load_reduce. Supported dtypes: float32.

Accumulator behavior:

The Vector Engine maintains internal accumulator registers that can be controlled via the reduce_cmd parameter:

reduce_cmd.reset_reduce: Reset accumulators to 0, then accumulate the current results.
reduce_cmd.reduce: Continue accumulating without resetting (useful for multi-step reductions).
reduce_cmd.load_reduce: Load the values from reduce_init into the accumulator, then accumulate the current result on top of it.
reduce_cmd.idle: (default) No accumulation performed, accumulator state unknown.

Note

Even when reduce_cmd is set to idle, the accumulator state may still be modified. Always use reset_reduce after any Vector Engine operation that ran with idle mode to ensure consistent behavior.

Note

The accumulator registers are shared for other Vector Engine accumulation instructions such nki.isa.range_select, nki.isa.select_reduce, nki.isa.tensor_scalar_cumulative,

Behavior

# Initialize reduction if requested
if reduce_cmd == reduce_cmd.reset_reduce:
    accumulator = 0
elif reduce_cmd == reduce_cmd.load_reduce:
    accumulator = reduce_init
elif reduce_cmd == reduce_cmd.idle:
    accumulator = undefined  # Not used

# Process each element
for i in range(num_elements):
    dst[i] = exp(src[i] - max_value)

    # Update reduction if active
    if reduce_cmd != reduce_cmd.idle:
        accumulator += dst[i]

nki.isa.exponential

Contents

nki.isa.exponential#