This document is relevant for: Inf2, Trn1, Trn2

nki.isa.tensor_copy_predicated#

nki.isa.tensor_copy_predicated(*, src, dst, predicate, mask=None, dtype=None, **kwargs)[source]#

Conditionally copy elements from the src tile to the destination tile on SBUF / PSUM based on a predicate using Vector Engine.

This instruction provides low-level control over conditional data movement on NeuronCores, optimized for scenarios where only selective copying of elements is needed. Either src or predicate may be in PSUM, but not both simultaneously. Both src and predicate are permitted to be in SBUF.

Shape and data type constraints:

  1. src, dst, and predicate must occupy the same number of partitions and same number of elements per partition.

  2. predicate must be of type uint8, uint16, or uint32.

  3. src and dst must share the same data type.

Behavior:

  • Where predicate is True: The corresponding elements from src are copied to dst tile

  • Where predicate is False: The corresponding values in dst tile are unmodified

Estimated instruction cost:

Cost (Vector Engine Cycles)

Condition

max(MIN_II, N)

If src is from SBUF and predicate is from PSUM or the other way around

max(MIN_II, 2N)

If both src and dst are in SBUF

  • N is the number of elements per partition in src tile

  • MIN_II is the minimum instruction initiation interval for small input tiles. MIN_II is roughly 64 engine cycles.

Parameters:
  • src – The source tile to copy elements from when predicate is True

  • dst – The destination tile to copy elements to

  • predicate – A tile that determines which elements to copy

  • mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)

  • dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.

Example:

import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl
from neuronxcc.nki.typing import tensor

##################################################################
# Example 1: Conditionally copies elements from the `on_true` tile to 
# SBUF/PSUM destination tile using Vector Engine, where copying occurs 
# only at positions where the predicate evaluates to True.
##################################################################

...
pre_tile: tensor[128, 512] = nl.load(predicate)
src_tile: tensor[128, 512] = nl.load(on_true_tensor)

ix, iy = nl.mgrid[0:128, 0:512]
dst_tile: tensor[128, 512] = nl.zeros(shape=src_tile.shape, dtype=src_tile.dtype)
dst_tile[ix, iy] = nl.load(on_false_tensor)

nisa.tensor_copy_predicated(src=src_tile, dst=dst_tile, predicate=pre_tile)

This document is relevant for: Inf2, Trn1, Trn2