This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.memset#
- nki.isa.memset(shape, value, dtype, *, mask=None, engine=0, **kwargs)[source]#
Initialize a tile filled with a compile-time constant value using Vector Engine. The shape of the tile is specified in the
shape
field and the initialized value in thevalue
field. The memset instruction supports all valid NKI dtypes (see Supported Data Types).- Parameters:
shape – the shape of the output tile; layout: (partition axis, free axis)
value – the constant value to initialize with
dtype – data type of the output tile (see Supported Data Types for more information)
mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)
engine – specify which engine to use for reciprocal:
nki.isa.vector_engine
ornki.isa.gpsimd_engine
;nki.isa.unknown_engine
by default, lets compiler select the best engine for the given input tile shape
- Returns:
a tile with shape shape whose elements are initialized to value.
Estimated instruction cost:
Given
N
is the number of elements per partition in the output tile, andMIN_II
is the minimum instruction initiation interval for small input tiles.MIN_II
is roughly 64 engine cycles.If the initialized value is zero and output data type is bfloat16/float16,
max(MIN_II, N/2)
Vector Engine cycles;Otherwise,
max(MIN_II, N)
Vector Engine cycles
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl ... ################################################################## # Example 1: Initialize a float32 tile a of shape (128, 128) # with a value of 0.2 ################################################################## a = nisa.memset(shape=(128, 128), value=0.2, dtype=nl.float32)
This document is relevant for: Inf2
, Trn1
, Trn2