This document is relevant for: Trn2, Trn3

Foreach Norm Kernel API Reference#

Compute L2 norm (Euclidean norm) of input tensor.

Computes sqrt(sum(x^2)) using SPMD parallelization across 2 cores with fused activation-reduce and sendrecv-based cross-core reduction.

Background#

The l2_norm_kernel kernel computes the L2 norm (Euclidean norm) of an input tensor using SPMD parallelization with fused activation-reduce and cross-core reduction.

API Reference#

Source code for this kernel API can be found at: foreach_norm.py

l2_norm_kernel#

nkilib.experimental.foreach.l2_norm_kernel(data: nl.ndarray, numel: int) nl.ndarray#

Compute L2 norm (Euclidean norm) of input tensor.

Parameters:
  • data (nl.ndarray) – [N], Input tensor on HBM.

  • numel (int) – Number of elements in data.

Returns:

[1, 1], L2 norm scalar on HBM.

Return type:

nl.ndarray

l1_norm_kernel#

nkilib.experimental.foreach.l1_norm_kernel(data: nl.ndarray, numel: int) nl.ndarray#

Compute L1 norm (Manhattan norm) of input tensor.

Parameters:
  • data (nl.ndarray) – [N], Input tensor on HBM.

  • numel (int) – Number of elements in data.

Returns:

[1, 1], L1 norm scalar on HBM.

Return type:

nl.ndarray

linf_norm_kernel#

nkilib.experimental.foreach.linf_norm_kernel(data: nl.ndarray, numel: int) nl.ndarray#

Compute Linf norm (max norm) of input tensor.

Parameters:
  • data (nl.ndarray) – [N], Input tensor on HBM.

  • numel (int) – Number of elements in data.

Returns:

[1, 1], Linf norm scalar on HBM.

Return type:

nl.ndarray

This document is relevant for: Trn2, Trn3