nki.isa.max8#

nki.isa.max8(dst, src, name=None)[source]#

Find the 8 largest values in each partition of the source tile.

This instruction reads the input elements, converts them to fp32 internally, and outputs the 8 largest values in descending order for each partition. By default, returns the same dtype as the input tensor.

The source tile can be up to 5-dimensional, while the output tile is always 2-dimensional. The number of elements read per partition must be between 8 and 16,384 inclusive. The output will always contain exactly 8 elements per partition. The source and output must have the same partition dimension size:

  • source: [par_dim, …]

  • output: [par_dim, 8]

Parameters:
  • dst – a 2D tile containing the 8 largest values per partition in descending order with shape [par_dim, 8]

  • src – the source tile to find maximum values from