nki.isa.iota#

nki.isa.iota(dst, pattern, offset, channel_multiplier=0, name=None)[source]#

Generate a constant literal pattern into SBUF using GpSimd Engine.

The pattern is defined by an int32 offset, a tensor access pattern of up to 4D pattern and an int32 channel_multiplier. The pattern field is a list of lists in the form of [[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]]. When fewer than 4D pattern is provided, NKI compiler automatically pads remaining dimensions with size of 1.

Given a 4D pattern (padded if needed), the instruction generates a stream of values using the following pseudo code:

num_partitions = dst.shape[0]
[[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]] = pattern

for channel_id in range(num_partitions):
    for w in range(num_w):
        for z in range(num_z):
            for y in range(num_y):
                for x in range(num_x):
                    value = offset + (channel_id * channel_multiplier) +
                            (w * step_w) + (z * step_z) + (y * step_y) + (x * step_x)

                    dst[channel_id, w, z, y, x] = value

The above pseudo code assumes dst has the same size in every dimension x/y/z/w for simplicity. However, the instruction allows any sizes in the free dimension, as long as the number of elements per partition in dst matches the product: num_w * num_z * num_y * num_x.

Memory types.

The output dst tile must be in SBUF.

Data types.

The generated values are computed in 32-bit integer arithmetic. The GpSimd Engine can cast these integer results to any valid NKI data type (see Supported Data Types for more information) before writing to the output tile. The output data type is determined by the dst tile’s data type.

Layout.

The partition dimension determines the number of active channels for parallel pattern generation.

Tile size.

The partition dimension size of dst must not exceed 128. The number of elements per partition of dst must not exceed the physical size of each SBUF partition. The total number of elements in pattern must match the number of elements per partition in the dst tile.

Parameters:
  • dst – the output tile in SBUF to store the generated pattern

  • pattern – a list of [step, num] to describe up to 4D tensor sizes and strides

  • offset – an int32 offset value to be added to every generated value

  • channel_multiplier – an int32 multiplier to be applied to the channel (parition) ID