nki.isa.bn_stats#
- nki.isa.bn_stats(dst, data, name=None)[source]#
Compute mean- and variance-related statistics for each partition of an input tile
datain parallel using Vector Engine.The output tile of the instruction has 6 elements per partition:
the
countof the even elements (of the input tile elements from the same partition)the
meanof the even elementsvariance * countof the even elementsthe
countof the odd elementsthe
meanof the odd elementsvariance * countof the odd elements
To get the final mean and variance of the input tile, we need to pass the above
bn_statsinstruction output into the bn_aggr instruction, which will output two elements per partition:mean (of the original input tile elements from the same partition)
variance
Due to hardware limitation, the number of elements per partition (i.e., free dimension size) of the input
datamust not exceed 512 (nl.tile_size.bn_stats_fmax). To calculate per-partition mean/variance of a tensor with more than 512 elements in free dimension, we can invokebn_statsinstructions on each 512-element tile and use a singlebn_aggrinstruction to aggregatebn_statsoutputs from all the tiles.Vector Engine performs the above statistics calculation in float32 precision. The engine automatically casts the input
datato float32 before performing computation. The float32 computation results are cast todst.dtypeat no additional performance cost.- Parameters:
dst – an output tile with 6-element statistics per partition
data – the input tile (up to 512 elements per partition)