This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.tensor_copy_predicated#
- nki.isa.tensor_copy_predicated(*, src, dst, predicate, mask=None, dtype=None, **kwargs)[source]#
Conditionally copy elements from the
src
tile to the destination tile on SBUF / PSUM based on apredicate
using Vector Engine.This instruction provides low-level control over conditional data movement on NeuronCores, optimized for scenarios where only selective copying of elements is needed. Either
src
orpredicate
may be in PSUM, but not both simultaneously. Bothsrc
andpredicate
are permitted to be in SBUF.Shape and data type constraints:
src
,dst
, andpredicate
must occupy the same number of partitions and same number of elements per partition.predicate
must be of typeuint8
,uint16
, oruint32
.src
anddst
must share the same data type.
Behavior:
Where predicate is True: The corresponding elements from src are copied to dst tile
Where predicate is False: The corresponding values in dst tile are unmodified
Estimated instruction cost:
Cost
(Vector Engine Cycles)
Condition
max(MIN_II, N)
If
src
is from SBUF andpredicate
is from PSUM or the other way aroundmax(MIN_II, 2N)
If both
src
anddst
are in SBUFN
is the number of elements per partition insrc
tileMIN_II
is the minimum instruction initiation interval for small input tiles.MIN_II
is roughly 64 engine cycles.
- Parameters:
src – The source tile to copy elements from when
predicate
is Truedst – The destination tile to copy elements to
predicate – A tile that determines which elements to copy
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
dtype – (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ################################################################## # Example 1: Conditionally copies elements from the `on_true` tile to # SBUF/PSUM destination tile using Vector Engine, where copying occurs # only at positions where the predicate evaluates to True. ################################################################## ... pre_tile: tensor[128, 512] = nl.load(predicate) src_tile: tensor[128, 512] = nl.load(on_true_tensor) ix, iy = nl.mgrid[0:128, 0:512] dst_tile: tensor[128, 512] = nl.zeros(shape=src_tile.shape, dtype=src_tile.dtype) dst_tile[ix, iy] = nl.load(on_false_tensor) nisa.tensor_copy_predicated(src=src_tile, dst=dst_tile, predicate=pre_tile)
This document is relevant for: Inf2
, Trn1
, Trn2