This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.tensor_copy_dynamic_dst#
- nki.isa.tensor_copy_dynamic_dst(*, dst, src, mask=None, dtype=None, **kwargs)[source]#
Create a copy of
src
tile within NeuronCore on-chip SRAMs using Vector Engine, withdst
located at a dynamic offset within each partition.Both source and destination tiles can be in either SBUF or PSUM.
The source and destination tiles must also have the same number of partitions and the same number of elements per partition.
The dynamic offset must be a scalar value resided in SBUF. If you have a list of dynamic offsets for scattering tiles in SBUF/PSUM, you may loop over each offset and call
tensor_copy_dynamic_dst
once per offset.Estimated instruction cost:
max(MIN_II_DYNAMIC, N)
engine cycles, where:N
is the number of elements per partition in thesrc
tile,MIN_II_DYNAMIC
is the minimum instruction initiation interval for instructions with dynamic destination location.MIN_II_DYNAMIC
is roughly 600 engine cycles.
- Parameters:
dst – the destination of copy, must be a tile in SBUF of PSUM that is dynamically indexed within each dimension.
src – the source of copy, must be a tile in SBUF or PSUM.
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
This document is relevant for: Inf2
, Trn1
, Trn2