This document is relevant for: Inf2, Trn1, Trn2

nki.isa.memset#

nki.isa.memset(shape, value, dtype, *, mask=None, engine=0, **kwargs)[source]#

Initialize a tile filled with a compile-time constant value using Vector Engine. The shape of the tile is specified in the shape field and the initialized value in the value field. The memset instruction supports all valid NKI dtypes (see Supported Data Types).

Parameters:
  • shape – the shape of the output tile; layout: (partition axis, free axis)

  • value – the constant value to initialize with

  • dtype – data type of the output tile (see Supported Data Types for more information)

  • mask – (optional) a compile-time constant predicate that controls whether/how this instruction is executed (see NKI API Masking for details)

  • engine – specify which engine to use for reciprocal: nki.isa.vector_engine or nki.isa.gpsimd_engine ; nki.isa.unknown_engine by default, lets compiler select the best engine for the given input tile shape

Returns:

a tile with shape shape whose elements are initialized to value.

Estimated instruction cost:

Given N is the number of elements per partition in the output tile, and MIN_II is the minimum instruction initiation interval for small input tiles. MIN_II is roughly 64 engine cycles.

  • If the initialized value is zero and output data type is bfloat16/float16, max(MIN_II, N/2) Vector Engine cycles;

  • Otherwise, max(MIN_II, N) Vector Engine cycles

Example:

import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl
...

##################################################################
# Example 1: Initialize a float32 tile a of shape (128, 128)
# with a value of 0.2
##################################################################
a = nisa.memset(shape=(128, 128), value=0.2, dtype=nl.float32)

This document is relevant for: Inf2, Trn1, Trn2