This document is relevant for: Inf2
, Trn1
, Trn1n
nki.isa.bn_aggr#
- nki.isa.bn_aggr(data, mask=None, dtype=None, **kwargs)[source]#
Aggregate one or multiple
bn_stats
outputs to generate a mean and variance per partition using Vector Engine.The input
data
tile effectively has an array of(count, mean, variance*count)
tuples per partition produced by bn_stats instructions. Therefore, the number of elements per partition ofdata
must be a modulo of three.Note, if you need to aggregate multiple
bn_stats
instruction outputs, it is recommended to declare a SBUF tensor and then make eachbn_stats
instruction write its output into the SBUF tensor at different offsets (see example implementation in Example 2 in bn_stats).Vector Engine performs the statistics aggregation in float32 precision. Therefore, the engine automatically casts the input
data
tile to float32 before performing float32 computation and is capable of casting the float32 computation results into another data type specified by thedtype
field, at no additional performance cost. Ifdtype
field is not specified, the instruction will cast the float32 results back to the same data type as the inputdata
tile.Estimated instruction cost:
13*(N/3)
Vector Engine cycles, whereN
is the number of elements per partition indata
.- Parameters:
data – an input tile with results of one or more bn_stats
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
- Returns:
an output tile with two elements per partition: a mean followed by a variance
This document is relevant for: Inf2
, Trn1
, Trn1n