This document is relevant for: Inf2, Trn1, Trn2
nki.isa.bn_aggr#
- nki.isa.bn_aggr(data, *, mask=None, dtype=None, **kwargs)[source]#
Aggregate one or multiple
bn_statsoutputs to generate a mean and variance per partition using Vector Engine.The input
datatile effectively has an array of(count, mean, variance*count)tuples per partition produced by bn_stats instructions. Therefore, the number of elements per partition ofdatamust be a modulo of three.Note, if you need to aggregate multiple
bn_statsinstruction outputs, it is recommended to declare a SBUF tensor and then make eachbn_statsinstruction write its output into the SBUF tensor at different offsets (see example implementation in Example 2 in bn_stats).Vector Engine performs the statistics aggregation in float32 precision. Therefore, the engine automatically casts the input
datatile to float32 before performing float32 computation and is capable of casting the float32 computation results into another data type specified by thedtypefield, at no additional performance cost. Ifdtypefield is not specified, the instruction will cast the float32 results back to the same data type as the inputdatatile.Estimated instruction cost:
max(MIN_II, 13*(N/3))Vector Engine cycles, whereNis the number of elements per partition indataandMIN_IIis the minimum instruction initiation interval for small input tiles.MIN_IIis roughly 64 engine cycles.- Parameters:
data – an input tile with results of one or more bn_stats
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
- Returns:
an output tile with two elements per partition: a mean followed by a variance
This document is relevant for: Inf2, Trn1, Trn2