This document is relevant for: Inf2, Trn1, Trn2

nki.isa.sequence_bounds#

nki.isa.sequence_bounds(*, segment_ids, dtype=None)[source]#

Compute the sequence bounds for a given set of segment IDs using GpSIMD Engine.

Given a tile of segment IDs, this function identifies where each segment begins and ends. For each element, it returns a pair of values: [start_index, end_index] indicating the boundaries of the segment that element belongs to. All segment IDs must be non-negative integers. Padding elements (with segment ID of zero) receive special boundary values: a start index of n and an end index of (-1), where n is the length of segment_ids.

The output tile contains two values per input element: the start index (first column) and end index (second column) of each segment. The partition dimension must always be 1. For example, with input shape (1, 512), the output shape becomes (1, 2, 512), where the additional dimension holds the start and end indices for each element.

The input tile (segment_ids) must have data type np.float32 or np.int32. The output tile data type is specified using the dtype field (must be np.float32 or np.int32). If dtype is not specified, the output data type will be the same as the input data type of segment_ids.

NumPy equivalent:

def compute_sequence_bounds(sequence):
  n = len(sequence)

  min_bounds = np.zeros(n, dtype=sequence.dtype)
  max_bounds = np.zeros(n, dtype=sequence.dtype)

  min_bound_pad = n
  max_bound_pad = -1

  min_bounds[0] = 0 if sequence[0] != 0 else min_bound_pad
  for i in range(1, n):
    if sequence[i] == 0:
      min_bounds[i] = min_bound_pad
    elif sequence[i] == sequence[i - 1]:
      min_bounds[i] = min_bounds[i - 1]
    else:
      min_bounds[i] = i

  max_bounds[-1] = n if sequence[-1] != 0 else max_bound_pad
  for i in range(n - 2, -1, -1):
    if sequence[i] == 0:
      max_bounds[i] = max_bound_pad
    elif sequence[i] == sequence[i + 1]:
      max_bounds[i] = max_bounds[i + 1]
    else:
      max_bounds[i] = i + 1

  return np.vstack((min_bounds, max_bounds))

b = (
  np.apply_along_axis(
    compute_sequence_bounds, axis=1, arr=reshaped_segment_ids
  )
  .reshape(m, 2, n)
  .astype(np.float32)
)
Parameters:
  • segment_ids – tile containing the segment IDs. Elements with ID=0 are treated as padding.

  • dtype – data type of the output (must be np.float32 or np.int32)

Returns:

tile containing the sequence bounds.

Example:

import neuronxcc.nki.isa as nisa
import neuronxcc.nki.language as nl
from neuronxcc.nki.typing import tensor

######################################################################
# Example 1: Generate tile of boundaries of sequence for each element:
######################################################################
# Input example
# segment_ids = np.array([[0, 1, 1, 2, 2, 2, 0, 3, 3]], dtype=np.int32)

# Expected output for this example:
# [[
#   [9, 1, 1, 3, 3, 3, 9, 7, 7]       # start index
#   [-1, 3, 3, 6, 6, 6, -1, 9, 9]     # end index
#   ]]
m, n = segment_ids.shape

ix, iy, iz = nl.mgrid[0:m, 0:2, 0:n]

out_tile = nl.ndarray([m, 2, n], dtype=segment_ids.dtype, buffer=nl.sbuf)
seq_tile = nl.load(segment_ids)
out_tile[ix, iy, iz] = nisa.sequence_bounds(segment_ids=seq_tile)

This document is relevant for: Inf2, Trn1, Trn2