nki.isa.iota#
- nki.isa.iota(dst, pattern, offset, channel_multiplier=0, name=None)[source]#
Generate a constant literal pattern into SBUF using GpSimd Engine.
The pattern is defined by an int32
offset, a tensor access pattern of up to 4Dpatternand an int32channel_multiplier. Thepatternfield is a list of lists in the form of[[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]]. When fewer than 4Dpatternis provided, NKI compiler automatically pads remaining dimensions with size of 1.Given a 4D pattern (padded if needed), the instruction generates a stream of values using the following pseudo code:
num_partitions = dst.shape[0] [[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]] = pattern for channel_id in range(num_partitions): for w in range(num_w): for z in range(num_z): for y in range(num_y): for x in range(num_x): value = offset + (channel_id * channel_multiplier) + (w * step_w) + (z * step_z) + (y * step_y) + (x * step_x) dst[channel_id, w, z, y, x] = value
The above pseudo code assumes
dsthas the same size in every dimensionx/y/z/wfor simplicity. However, the instruction allows any sizes in the free dimension, as long as the number of elements per partition indstmatches the product:num_w * num_z * num_y * num_x.Memory types.
The output
dsttile must be in SBUF.Data types.
The generated values are computed in 32-bit integer arithmetic. The GpSimd Engine can cast these integer results to any valid NKI data type (see Supported Data Types for more information) before writing to the output tile. The output data type is determined by the
dsttile’s data type.Layout.
The partition dimension determines the number of active channels for parallel pattern generation.
Tile size.
The partition dimension size of
dstmust not exceed 128. The number of elements per partition ofdstmust not exceed the physical size of each SBUF partition. The total number of elements inpatternmust match the number of elements per partition in thedsttile.- Parameters:
dst – the output tile in SBUF to store the generated pattern
pattern – a list of [step, num] to describe up to 4D tensor sizes and strides
offset – an int32 offset value to be added to every generated value
channel_multiplier – an int32 multiplier to be applied to the channel (parition) ID