nki.isa.nc_n_gather#
- nki.isa.nc_n_gather(dst, data, indices, name=None)[source]#
Gather elements from
dataaccording toindicesusing GpSimd Engine.This instruction performs a gather operation where elements are selected from the input
datatile based on flattened indices specified in theindicestile. The free dimensions ofdataare treated as if they were flattened into a single dimension for indexing purposes, while the partition dimension defines the parallel compute boundary.The gather operation works independently within each partition. For each partition, the free dimensions of
dataare conceptually flattened, and elements are gathered according to the corresponding flattened indices from the same partition inindices. If you need to gather elements across partitions (within groups of partitions), consider using nisa.local_gather.The
ninnc_n_gatherindicates that this instruction corresponds tongroups of instructions in the underlying ISA, wheren = ceil(elems_per_partition / 512).Alternatively, we could gather elements by calling nisa.dma_copy with an indirect access pattern derived from
indices. However, this is less efficient thannc_n_gather, which uses GpSimd Engine to perform local data movement within SBUF, without using DMA engines.Memory types.
All input and output tiles (
data,indices, anddst) must be in SBUF. GpSimd Engine cannot access PSUM (see NeuronCore-v2 Compute Engines for details).Data types.
The input
datatile can be any valid NKI data type (see Supported Data Types for more information). The outputdsttile must have the same data type asdata. Theindicestile must be uint32.Layout.
The partition dimension of
data,indices, anddstmust be the same. Within each partition, the free dimensions ofdataare flattened for indexing. The free dimensions ofindicesdetermine the shape of the outputdst.Tile size.
The partition dimension size of
data,indices, anddstmust be the same and must not exceed 128. The number of elements per partition indstmust match the number of elements per partition inindices. The indices’ values must be within the range[0, data.size / data.shape[0]).- Parameters:
dst – output tile containing the gathered elements
data – the input tile to gather elements from
indices – the indices tile (uint32) specifying which elements to gather