This document is relevant for: Trn2, Trn3
nki.isa.quantize_mx#
- nki.isa.quantize_mx(dst, src, dst_scale, name=None)[source]#
Quantize FP16/BF16 data to MXFP8 tensors (both data and scales) using Vector Engine.
Note
Available only on NeuronCore-v4 and newer.
The resulting MXFP8 tensors,
dstanddst_scaleare as defined in the OCP Microscaling standard. This instruction calculates the required scales for each group of 32 values insrc, divides them by the calculated scale, and casts to the target MXFP8 datatype. The output layout is suitable for direct consumption by thenisa.nc_matmul_mxAPI running on Tensor Engine.Memory types.
All input
srcand output tiles (dstanddst_scale) must be in SBUF.Data types.
The input
srctile must be float16 or bfloat16. The outputdsttile must be float8_e5m2_x4 or float8_e4m3fn_x4 (4-packed FP8 data types). Thedst_scaletile must be uint8.The 4-packed data types (float8_e5m2_x4/float8_e4m3fn_x4) are 32-bit data types that pack four 8-bit float8_e5m2/float8_e4m3fn values.
Layout.
The quantization operates on groups of 32 elements from the input
srctile, where each group consists of 8 partitions × 4 elements per partition. For each 32-element group, the instruction produces:Quantized FP8 data in
dstOne shared scale value in
dst_scaleper group
Tile size.
The partition dimension size of
srcmust be a multiple of 32 and must not exceed 128.The free dimension size of
srcmust be a multiple of 4 and must not exceed the physical size of each SBUF partition.The
dsttile has the same partition dimension size assrcbut a free dimension size that is 1/4 ofsrcfree dimension size due to the special 4-packed FP8 data types.
- Parameters:
dst – the quantized MXFP8 output tile
src – the input FP16/BF16 tile to be quantized
dst_scale – the output scale tile
This document is relevant for: Trn2, Trn3