This document is relevant for: Trn2, Trn3
Foreach Elementwise Kernel API Reference#
Elementwise add scalar to tensor.
Computes out = data + scalar using SPMD parallelization across cores.
Background#
The add_scalar_kernel kernel performs elementwise arithmetic operations (add, subtract, multiply, divide) between tensors and scalars or between pairs of tensors, using SPMD parallelization across cores.
API Reference#
Source code for this kernel API can be found at: foreach_elementwise.py
add_scalar_kernel#
- nkilib.experimental.foreach.add_scalar_kernel(data: nl.ndarray, scalar_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise add scalar to tensor.
- Parameters:
data (
nl.ndarray) – [N], Input tensor on HBM. Must have ndim >= 1.scalar_tensor (
nl.ndarray) – [P_MAX, 1], Scalar broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
sub_scalar_kernel#
- nkilib.experimental.foreach.sub_scalar_kernel(data: nl.ndarray, scalar_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise subtract scalar from tensor.
- Parameters:
data (
nl.ndarray) – [N], Input tensor on HBM. Must have ndim >= 1.scalar_tensor (
nl.ndarray) – [P_MAX, 1], Scalar broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
mul_scalar_kernel#
- nkilib.experimental.foreach.mul_scalar_kernel(data: nl.ndarray, scalar_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise multiply tensor by scalar.
- Parameters:
data (
nl.ndarray) – [N], Input tensor on HBM. Must have ndim >= 1.scalar_tensor (
nl.ndarray) – [P_MAX, 1], Scalar broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
div_scalar_kernel#
- nkilib.experimental.foreach.div_scalar_kernel(data: nl.ndarray, scalar_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise divide tensor by scalar.
- Parameters:
data (
nl.ndarray) – [N], Input tensor on HBM. Must have ndim >= 1.scalar_tensor (
nl.ndarray) – [P_MAX, 1], Scalar broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
add_tensor_kernel#
- nkilib.experimental.foreach.add_tensor_kernel(data1: nl.ndarray, data2: nl.ndarray, alpha_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise add tensors with alpha scaling.
- Parameters:
data1 (
nl.ndarray) – [N], First input tensor on HBM. Must have ndim >= 1.data2 (
nl.ndarray) – [N], Second input tensor on HBM.alpha_tensor (
nl.ndarray) – [P_MAX, 1], Alpha scalar broadcast tensor on HBM.numel (
int) – Number of elements in data1.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
sub_tensor_kernel#
- nkilib.experimental.foreach.sub_tensor_kernel(data1: nl.ndarray, data2: nl.ndarray, alpha_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise subtract tensors with alpha scaling.
- Parameters:
data1 (
nl.ndarray) – [N], First input tensor on HBM. Must have ndim >= 1.data2 (
nl.ndarray) – [N], Second input tensor on HBM.alpha_tensor (
nl.ndarray) – [P_MAX, 1], Alpha scalar broadcast tensor on HBM.numel (
int) – Number of elements in data1.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
mul_tensor_kernel#
- nkilib.experimental.foreach.mul_tensor_kernel(data1: nl.ndarray, data2: nl.ndarray, numel: int) nl.ndarray#
Elementwise multiply tensors.
- Parameters:
data1 (
nl.ndarray) – [N], First input tensor on HBM. Must have ndim >= 1.data2 (
nl.ndarray) – [N], Second input tensor on HBM.numel (
int) – Number of elements in data1.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
div_tensor_kernel#
- nkilib.experimental.foreach.div_tensor_kernel(data1: nl.ndarray, data2: nl.ndarray, numel: int) nl.ndarray#
Elementwise divide tensors.
- Parameters:
data1 (
nl.ndarray) – [N], First input tensor on HBM. Must have ndim >= 1.data2 (
nl.ndarray) – [N], Second input tensor on HBM.numel (
int) – Number of elements in data1.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
addcdiv_kernel#
- nkilib.experimental.foreach.addcdiv_kernel(data: nl.ndarray, data1: nl.ndarray, data2: nl.ndarray, value_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise addcdiv: data + value * (data1 / data2).
- Parameters:
data (
nl.ndarray) – [N], Base input tensor on HBM.data1 (
nl.ndarray) – [N], Numerator tensor on HBM.data2 (
nl.ndarray) – [N], Denominator tensor on HBM.value_tensor (
nl.ndarray) – [P_MAX, 1], Scalar value broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
addcmul_kernel#
- nkilib.experimental.foreach.addcmul_kernel(data: nl.ndarray, data1: nl.ndarray, data2: nl.ndarray, value_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise addcmul: data + value * (data1 * data2).
- Parameters:
data (
nl.ndarray) – [N], Base input tensor on HBM.data1 (
nl.ndarray) – [N], First multiplicand tensor on HBM.data2 (
nl.ndarray) – [N], Second multiplicand tensor on HBM.value_tensor (
nl.ndarray) – [P_MAX, 1], Scalar value broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
lerp_kernel#
- nkilib.experimental.foreach.lerp_kernel(data: nl.ndarray, end: nl.ndarray, weight_tensor: nl.ndarray, numel: int) nl.ndarray#
Elementwise linear interpolation: data + weight * (end - data).
- Parameters:
data (
nl.ndarray) – [N], Start tensor on HBM.end (
nl.ndarray) – [N], End tensor on HBM.weight_tensor (
nl.ndarray) – [P_MAX, 1], Interpolation weight broadcast tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
sqrt_kernel#
- nkilib.experimental.foreach.sqrt_kernel(data: nl.ndarray, numel: int) nl.ndarray#
Elementwise square root.
- Parameters:
data (
nl.ndarray) – [N], Input tensor on HBM. Elements must be non-negative.numel (
int) – Number of elements in data.
- Returns:
[N], Output tensor on HBM.
- Return type:
nl.ndarray
This document is relevant for: Trn2, Trn3