nki.isa.exponential#

nki.isa.exponential(dst, src, max_value=0.0, reduce_res=None, reduce_cmd=reduce_cmd.idle, reduce_init=0.0)[source]#

Apply exponential function to each element after subtracting a max_value using Vector Engine.

Note

Available only on NeuronCore-v4 and newer.

This instruction computes exp(src - max_value) for each element. The instruction can optionally maintain a running sum of the exponential values using shared internal reduction registers in the Vector Engine.

The exponential operation is performed as:

dst[i] = exp(src[i] - max_value)

When accumulation is enabled through reduce_cmd, the instruction also computes:

reduce_res[i] = sum(dst[i])

The Vector Engine performs the computation in float32 precision internally and can output results in various data types as specified by the dst dtype field.

Constraints

  • Supported engines: Vector.

  • src, dst must have the same number of elements in the partition dimension.

  • src, dst must have the same number of elements in the free dimensions.

  • src, dst can be up to 4D tensor.

  • reduce_init should be unset or set to 0.0 when reduce_cmd is not load_reduce.

Parameters:
  • dst – The output tile with exponential function applied. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32, int8, int16, int32, uint8, uint16.

  • src – The input tile to apply exponential function on. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, int8, int16, int32, uint8, uint16, uint32.

  • max_value – The maximum value to subtract from each element before applying exponential (for numerical stability). Can be a scalar or vector of shape (src.shape[0], 1). Supported dtypes: float32.

  • reduce_res – Optional tile to store reduction results (sum of exponentials). Must have shape (src.shape[0], 1). Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32.

  • reduce_cmd – Control the state of reduction registers for accumulating exponential results. Supported: idle, reset_reduce, reduce, load_reduce.

  • reduce_init – Initial value for reduction when using reduce_cmd.load_reduce. Supported dtypes: float32.

Accumulator behavior:

The Vector Engine maintains internal accumulator registers that can be controlled via the reduce_cmd parameter:

  • reduce_cmd.reset_reduce: Reset accumulators to 0, then accumulate the current results.

  • reduce_cmd.reduce: Continue accumulating without resetting (useful for multi-step reductions).

  • reduce_cmd.load_reduce: Load the values from reduce_init into the accumulator, then accumulate the current result on top of it.

  • reduce_cmd.idle: (default) No accumulation performed, accumulator state unknown.

Note

Even when reduce_cmd is set to idle, the accumulator state may still be modified. Always use reset_reduce after any Vector Engine operation that ran with idle mode to ensure consistent behavior.

Note

The accumulator registers are shared for other Vector Engine accumulation instructions such nki.isa.range_select, nki.isa.select_reduce, nki.isa.tensor_scalar_cumulative,

Behavior

# Initialize reduction if requested
if reduce_cmd == reduce_cmd.reset_reduce:
    accumulator = 0
elif reduce_cmd == reduce_cmd.load_reduce:
    accumulator = reduce_init
elif reduce_cmd == reduce_cmd.idle:
    accumulator = undefined  # Not used

# Process each element
for i in range(num_elements):
    dst[i] = exp(src[i] - max_value)

    # Update reduction if active
    if reduce_cmd != reduce_cmd.idle:
        accumulator += dst[i]