nki.isa.bn_aggr#

nki.isa.bn_aggr(dst, data, name=None)[source]#

Aggregate one or multiple bn_stats outputs to generate a mean and variance per partition using Vector Engine.

The input data tile effectively has an array of (count, mean, variance*count) tuples per partition produced by bn_stats instructions. Therefore, the number of elements per partition of data must be a modulo of three.

Note, if you need to aggregate multiple bn_stats instruction outputs, it is recommended to declare a SBUF tensor and then make each bn_stats instruction write its output into the SBUF tensor at different offsets.

Vector Engine performs the statistics aggregation in float32 precision. The engine automatically casts the input data to float32 before performing computation. The float32 computation results are cast to dst.dtype at no additional performance cost.

Parameters:
  • dst – an output tile with two elements per partition: a mean followed by a variance

  • data – an input tile with results of one or more bn_stats