This document is relevant for: Inf2
, Trn1
, Trn2
nki.isa.sequence_bounds#
- nki.isa.sequence_bounds(*, segment_ids, dtype=None)[source]#
Compute the sequence bounds for a given set of segment IDs using GpSIMD Engine.
Given a tile of segment IDs, this function identifies where each segment begins and ends. For each element, it returns a pair of values: [start_index, end_index] indicating the boundaries of the segment that element belongs to. All segment IDs must be non-negative integers. Padding elements (with segment ID of zero) receive special boundary values: a start index of n and an end index of (-1), where n is the length of
segment_ids
.The output tile contains two values per input element: the start index (first column) and end index (second column) of each segment. The partition dimension must always be 1. For example, with input shape (1, 512), the output shape becomes (1, 2, 512), where the additional dimension holds the start and end indices for each element.
The input tile (
segment_ids
) must have data type np.float32 or np.int32. The output tile data type is specified using thedtype
field (must be np.float32 or np.int32). Ifdtype
is not specified, the output data type will be the same as the input data type ofsegment_ids
.NumPy equivalent:
def compute_sequence_bounds(sequence): n = len(sequence) min_bounds = np.zeros(n, dtype=sequence.dtype) max_bounds = np.zeros(n, dtype=sequence.dtype) min_bound_pad = n max_bound_pad = -1 min_bounds[0] = 0 if sequence[0] != 0 else min_bound_pad for i in range(1, n): if sequence[i] == 0: min_bounds[i] = min_bound_pad elif sequence[i] == sequence[i - 1]: min_bounds[i] = min_bounds[i - 1] else: min_bounds[i] = i max_bounds[-1] = n if sequence[-1] != 0 else max_bound_pad for i in range(n - 2, -1, -1): if sequence[i] == 0: max_bounds[i] = max_bound_pad elif sequence[i] == sequence[i + 1]: max_bounds[i] = max_bounds[i + 1] else: max_bounds[i] = i + 1 return np.vstack((min_bounds, max_bounds)) b = ( np.apply_along_axis( compute_sequence_bounds, axis=1, arr=reshaped_segment_ids ) .reshape(m, 2, n) .astype(np.float32) )
- Parameters:
segment_ids – tile containing the segment IDs. Elements with ID=0 are treated as padding.
dtype – data type of the output (must be np.float32 or np.int32)
- Returns:
tile containing the sequence bounds.
Example:
import neuronxcc.nki.isa as nisa import neuronxcc.nki.language as nl from neuronxcc.nki.typing import tensor ###################################################################### # Example 1: Generate tile of boundaries of sequence for each element: ###################################################################### # Input example # segment_ids = np.array([[0, 1, 1, 2, 2, 2, 0, 3, 3]], dtype=np.int32) # Expected output for this example: # [[ # [9, 1, 1, 3, 3, 3, 9, 7, 7] # start index # [-1, 3, 3, 6, 6, 6, -1, 9, 9] # end index # ]] m, n = segment_ids.shape ix, iy, iz = nl.mgrid[0:m, 0:2, 0:n] out_tile = nl.ndarray([m, 2, n], dtype=segment_ids.dtype, buffer=nl.sbuf) seq_tile = nl.load(segment_ids) out_tile[ix, iy, iz] = nisa.sequence_bounds(segment_ids=seq_tile)
This document is relevant for: Inf2
, Trn1
, Trn2