This document is relevant for: Trn2, Trn3
Foreach Norm Kernel API Reference#
Compute L2 norm (Euclidean norm) of input tensor.
Computes sqrt(sum(x^2)) using SPMD parallelization across 2 cores with fused activation-reduce and sendrecv-based cross-core reduction.
Background#
The l2_norm_kernel kernel computes the L2 norm (Euclidean norm) of an input tensor using SPMD parallelization with fused activation-reduce and cross-core reduction.
API Reference#
Source code for this kernel API can be found at: foreach_norm.py
l2_norm_kernel#
l1_norm_kernel#
linf_norm_kernel#
- nkilib.experimental.foreach.linf_norm_kernel(data: nl.ndarray, numel: int) nl.ndarray#
Compute Linf norm (max norm) of input tensor.
- Parameters:
data (
nl.ndarray) – [N], Input tensor on HBM.numel (
int) – Number of elements in data.
- Returns:
[1, 1], Linf norm scalar on HBM.
- Return type:
nl.ndarray
This document is relevant for: Trn2, Trn3