This document is relevant for: Inf2, Trn1, Trn2

nki.isa.tensor_copy_dynamic_dst#

nki.isa.tensor_copy_dynamic_dst(*, dst, src, mask=None, dtype=None, **kwargs)[source]#

Create a copy of src tile within NeuronCore on-chip SRAMs using Vector Engine, with dst located at a dynamic offset within each partition.

Both source and destination tiles can be in either SBUF or PSUM.

The source and destination tiles must also have the same number of partitions and the same number of elements per partition.

The dynamic offset must be a scalar value resided in SBUF. If you have a list of dynamic offsets for scattering tiles in SBUF/PSUM, you may loop over each offset and call tensor_copy_dynamic_dst once per offset.

Estimated instruction cost:

max(MIN_II_DYNAMIC, N) engine cycles, where:

  • N is the number of elements per partition in the src tile,

  • MIN_II_DYNAMIC is the minimum instruction initiation interval for instructions with dynamic destination location. MIN_II_DYNAMIC is roughly 600 engine cycles.

Parameters:
  • dst – the destination of copy, must be a tile in SBUF of PSUM that is dynamically indexed within each dimension.

  • src – the source of copy, must be a tile in SBUF or PSUM.

  • mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)

  • dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.

This document is relevant for: Inf2, Trn1, Trn2