This document is relevant for: Inf2, Trn1, Trn2
nki.isa.tensor_copy_predicated#
- nki.isa.tensor_copy_predicated(*, src, dst, predicate, mask=None, dtype=None, reverse_pred=False, **kwargs)[source]#
Conditionally copy elements from the
srctile to the destination tile on SBUF / PSUM based on apredicateusing Vector Engine.This instruction provides low-level control over conditional data movement on NeuronCores, optimized for scenarios where only selective copying of elements is needed. Either
srcorpredicatemay be in PSUM, but not both simultaneously. Bothsrcandpredicateare permitted to be in SBUF.Shape and data type constraints:
src(if it is a tensor),dst, andpredicatemust occupy the same number of partitions and same number of elements per partition.predicatemust be of typeuint8,uint16, oruint32.srcanddstmust share the same data type.
Behavior:
Where predicate is True: The corresponding elements from src are copied to dst tile. If src is a scalar, the scalar is copied to the dst tile.
Where predicate is False: The corresponding values in dst tile are unmodified
Estimated instruction cost:
Cost
(Vector Engine Cycles)Condition
max(MIN_II, N)If
srcis from SBUF andpredicateis from PSUM or the other way aroundmax(MIN_II, 2N)If both
srcanddstare in SBUFNis the number of elements per partition insrctileMIN_IIis the minimum instruction initiation interval for small input tiles.MIN_IIis roughly 64 engine cycles.
- Parameters:
src – The source tile or number to copy elements from when
predicateis Truedst – The destination tile to copy elements to
predicate – A tile that determines which elements to copy
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
reverse_pred – A boolean that reverses the effect of
predicate.
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ################################################################## # Example 1: Conditionally copies elements from the `on_true` tile to # SBUF/PSUM destination tile using Vector Engine, where copying occurs # only at positions where the predicate evaluates to True. ################################################################## ... pre_tile: tensor[128, 512] = nl.load(predicate) src_tile: tensor[128, 512] = nl.load(on_true_tensor) ix, iy = nl.mgrid[0:128, 0:512] dst_tile: tensor[128, 512] = nl.zeros(shape=src_tile.shape, dtype=src_tile.dtype) dst_tile[ix, iy] = nl.load(on_false_tensor) nisa.tensor_copy_predicated(src=src_tile, dst=dst_tile, predicate=pre_tile)
This document is relevant for: Inf2, Trn1, Trn2