nki.isa.sequence_bounds#

nki.isa.sequence_bounds(dst, segment_ids, name=None)[source]#

Compute the sequence bounds for a given set of segment IDs using GpSIMD Engine.

Given a tile of segment IDs, this function identifies where each segment begins and ends. For each element, it returns a pair of values: [start_index, end_index] indicating the boundaries of the segment that element belongs to. All segment IDs must be non-negative integers. Padding elements (with segment ID of zero) receive special boundary values: a start index of n and an end index of (-1), where n is the length of segment_ids.

The output tile contains two values per input element: the start index (first column) and end index (second column) of each segment. The partition dimension must always be 1. For example, with input shape (1, 512), the output shape becomes (1, 2, 512), where the additional dimension holds the start and end indices for each element.

The input tile (segment_ids) must have data type np.float32 or np.int32. The output tile data type is specified using the dtype field (must be np.float32 or np.int32). If dtype is not specified, the output data type will be the same as the input data type of segment_ids.

NumPy equivalent:

def compute_sequence_bounds(sequence):
  n = len(sequence)

  min_bounds = np.zeros(n, dtype=sequence.dtype)
  max_bounds = np.zeros(n, dtype=sequence.dtype)

  min_bound_pad = n
  max_bound_pad = -1

  min_bounds[0] = 0 if sequence[0] != 0 else min_bound_pad
  for i in range(1, n):
    if sequence[i] == 0:
      min_bounds[i] = min_bound_pad
    elif sequence[i] == sequence[i - 1]:
      min_bounds[i] = min_bounds[i - 1]
    else:
      min_bounds[i] = i

  max_bounds[-1] = n if sequence[-1] != 0 else max_bound_pad
  for i in range(n - 2, -1, -1):
    if sequence[i] == 0:
      max_bounds[i] = max_bound_pad
    elif sequence[i] == sequence[i + 1]:
      max_bounds[i] = max_bounds[i + 1]
    else:
      max_bounds[i] = i + 1

  return np.vstack((min_bounds, max_bounds))

b = (
  np.apply_along_axis(
    compute_sequence_bounds, axis=1, arr=reshaped_segment_ids
  )
  .reshape(m, 2, n)
  .astype(np.float32)
)
Parameters:
  • dst – tile containing the sequence bounds.

  • segment_ids – tile containing the segment IDs. Elements with ID=0 are treated as padding.