nki.isa.exponential#
- nki.isa.exponential(dst, src, max_value=0.0, reduce_res=None, reduce_cmd=reduce_cmd.idle, reduce_init=0.0)[source]#
Apply exponential function to each element after subtracting a max_value using Vector Engine.
Note
Available only on NeuronCore-v4 and newer.
This instruction computes
exp(src - max_value)for each element. The instruction can optionally maintain a running sum of the exponential values using shared internal reduction registers in the Vector Engine.The exponential operation is performed as:
dst[i] = exp(src[i] - max_value)
When accumulation is enabled through
reduce_cmd, the instruction also computes:reduce_res[i] = sum(dst[i])
The Vector Engine performs the computation in float32 precision internally and can output results in various data types as specified by the
dstdtype field.Constraints
Supported engines: Vector.
src,dstmust have the same number of elements in the partition dimension.src,dstmust have the same number of elements in the free dimensions.src,dstcan be up to 4D tensor.reduce_initshould be unset or set to0.0whenreduce_cmdis notload_reduce.
- Parameters:
dst – The output tile with exponential function applied. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32, int8, int16, int32, uint8, uint16.
src – The input tile to apply exponential function on. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, int8, int16, int32, uint8, uint16, uint32.
max_value – The maximum value to subtract from each element before applying exponential (for numerical stability). Can be a scalar or vector of shape
(src.shape[0], 1). Supported dtypes: float32.reduce_res – Optional tile to store reduction results (sum of exponentials). Must have shape
(src.shape[0], 1). Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32.reduce_cmd – Control the state of reduction registers for accumulating exponential results. Supported:
idle,reset_reduce,reduce,load_reduce.reduce_init – Initial value for reduction when using
reduce_cmd.load_reduce. Supported dtypes: float32.
Accumulator behavior:
The Vector Engine maintains internal accumulator registers that can be controlled via the
reduce_cmdparameter:reduce_cmd.reset_reduce: Reset accumulators to 0, then accumulate the current results.reduce_cmd.reduce: Continue accumulating without resetting (useful for multi-step reductions).reduce_cmd.load_reduce: Load the values fromreduce_initinto the accumulator, then accumulate the current result on top of it.reduce_cmd.idle: (default) No accumulation performed, accumulator state unknown.
Note
Even when
reduce_cmdis set toidle, the accumulator state may still be modified. Always usereset_reduceafter any Vector Engine operation that ran withidlemode to ensure consistent behavior.Note
The accumulator registers are shared for other Vector Engine accumulation instructions such nki.isa.range_select, nki.isa.select_reduce, nki.isa.tensor_scalar_cumulative,
Behavior
# Initialize reduction if requested if reduce_cmd == reduce_cmd.reset_reduce: accumulator = 0 elif reduce_cmd == reduce_cmd.load_reduce: accumulator = reduce_init elif reduce_cmd == reduce_cmd.idle: accumulator = undefined # Not used # Process each element for i in range(num_elements): dst[i] = exp(src[i] - max_value) # Update reduction if active if reduce_cmd != reduce_cmd.idle: accumulator += dst[i]