This document is relevant for: Inf2, Trn1, Trn2

nki.isa.tensor_copy_dynamic_src#

nki.isa.tensor_copy_dynamic_src(src, *, mask=None, dtype=None, **kwargs)[source]#

Copy a source tile in SBUF, with a dynamic offset along the free dimension (except for the partition dimension) using Vector, Scalar or GpSimd Engine.

Consider combining with nl.mgrid to specify a static grid and use an index tensor (containing offsets) to specify dynamic offsets. The source tile src must be in SBUF / PSUM. Index tensor which specifies the offsets must be in SBUF memory.

In addition, since GpSimd Engine cannot access PSUM in NeuronCore, Scalar or Vector Engine must be chosen when the input or output tile is in PSUM (see NeuronCore-v2 Compute Engines for details). By default, this API returns a tile in SBUF, unless the returned value is assigned to a pre-declared PSUM tile.

Estimated instruction cost:

Each index element specifying the offset along the free dimension will trigger a tensor_copy instruction. In addition, since the index is dynamic, an overhead of approximately 140 cycles is incurred to read index elements from SBUF / PSUM into the engine.

Parameters:
  • src – the source of copy, must be a tile in SBUF or PSUM.

  • mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)

  • dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.

Example:

import neuronxcc.nki.typing as nt
import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl
...


##############################################################################################
# TensorCopyDynamicSrc example 1:
# - ``in_tensor`` on HBM is of shape [128 x 512]
# - ``in_tensor_sbuf`` on SBUF is of shape [128 x 512]
# - ``idx_tensor`` on HBM is of shape [1 x 64] (with values [4, 5, 6, 7, ...])
# - ``idx_tensor`` values are loaded into a SBUF tile, ``idx_tensor_sbuf``, from HBM
# - ``in_tensor_sbuf`` is copied to ``out_sbuf``based on indices stored in ``idx_tensor_sbuf``
# - ``out_tensor`` of shape [128 x 64] is finally written to HBM
##############################################################################################
ix, iy = nl.mgrid[0:128, 0:1]

in_tensor_sbuf = nl.load(in_tensor)

# indices must be on SBUF
idx_tensor_sbuf: nt.tensor[1, 64] = nl.load(idx_tensor)

# write temporary output to SBUF
out_sbuf: nt.tensor[128, 64] = nl.ndarray([128, 64], dtype=in_tensor.dtype,
                                          buffer=nl.sbuf)

# in each iteration a 1 X 1 tensor offset is accessed in ``idx_tile``
# in our example, we select the dynamic offset along axis=1.
# ``idx_tensor`` is dynamically populated.
for b_idx in nl.affine_range(idx_tensor_sbuf.shape[1]):
  out_sbuf[ix, b_idx] = nisa.tensor_copy_dynamic_src(
      in_tensor_sbuf[ix, idx_tensor_sbuf[0, b_idx] + iy])
...
import neuronxcc.nki.typing as nt
import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl
...


###############################################################################################
# TensorCopyDynamicSrc example 1:
# - ``in_tensor`` on HBM is of shape [128 x 512 x 4]
# - ``in_tensor_sbuf`` on SBUF has shape [128 x 512 x 4]
# - ``idx_tensor`` on HBM is of shape [1 x 8] (with values [4, 5, 6, 7, ...])
# - ``idx_tensor`` values are loaded into a SBUF tile, ``idx_tensor_sbuf``, from HBM
# - ``in_tensor_sbuf`` is copied to ``out_sbuf``based on indices stored in ``idx_tensor_sbuf``
# - ``out_tensor`` of shape [128 x 8 x 4] is finally written to HBM
###############################################################################################
ix, iy, iz = nl.mgrid[0:128, 0:1, 0:4]

in_tensor_sbuf = nl.load(in_tensor)

# indices must be on SBUF
idx_tensor_sbuf: nt.tensor[1, 8] = nl.load(idx_tensor)

# write temporary output to SBUF
out_sbuf: nt.tensor[128, 8, 4] = nl.ndarray([128, 8, 4], dtype=in_tensor.dtype,
                                            buffer=nl.sbuf)

# in each iteration a 1 X 1 tensor offset is accessed in ``idx_tile``
# in our example, we select the dynamic offset along axis=1.
# ``idx_tensor`` is dynamically populated.
for b_idx in nl.affine_range(idx_tensor.shape[1]):
  out_sbuf[ix, b_idx, iz] = nisa.tensor_copy_dynamic_src(
      in_tensor_sbuf[ix, idx_tensor_sbuf[0, b_idx], iz])
...

This document is relevant for: Inf2, Trn1, Trn2