nki.isa.bn_aggr#
- nki.isa.bn_aggr(dst, data, name=None)[source]#
Aggregate one or multiple
bn_statsoutputs to generate a mean and variance per partition using Vector Engine.The input
datatile effectively has an array of(count, mean, variance*count)tuples per partition produced by bn_stats instructions. Therefore, the number of elements per partition ofdatamust be a modulo of three.Note, if you need to aggregate multiple
bn_statsinstruction outputs, it is recommended to declare a SBUF tensor and then make eachbn_statsinstruction write its output into the SBUF tensor at different offsets.Vector Engine performs the statistics aggregation in float32 precision. Therefore, the engine automatically casts the input
datatile to float32 before performing float32 computation and is capable of casting the float32 computation results into another data type specified by thedtypefield, at no additional performance cost. Ifdtypefield is not specified, the instruction will cast the float32 results back to the same data type as the inputdatatile.- Parameters:
dst – an output tile with two elements per partition: a mean followed by a variance
data – an input tile with results of one or more bn_stats