This document is relevant for: Trn2, Trn3

Conv3D Kernel API Reference#

3D Convolution using tensor engine with K-replication strategy and W-contiguous tiling.

Implements a 3D convolution operation (x_in * filters + bias) optimized for NeuronCore using a K-replication strategy for filter loading and W-contiguous tiling for output computation. Supports configurable stride, padding, dilation, optional bias, optional activation function, and LNC sharding. Intended Usage Range: B: 1-128 C_in: 3-1280, C_out: 3-2048 D: 1-1024, H: 1-1024, W: 1-1024 K_d: 1-64, K_h: 1-64, K_w: 1-64 Stride: 1-64 per dimension Dilation: 1-64 per dimension

Background#

The conv3d kernel implements 3D convolution using the tensor engine with a K-replication strategy for filter loading and W-contiguous tiling for output computation. It supports configurable stride, padding, dilation, optional bias, activation fusion, and LNC sharding.

API Reference#

Source code for this kernel API can be found at: conv3d.py

conv3d#

nkilib.experimental.conv.conv3d(x_in: nl.ndarray, filters: nl.ndarray, bias: Optional[nl.ndarray] = None, stride: tuple[int, int, int] = (1, 1, 1), padding: tuple[int, int, int, int, int, int] = (0, 0, 0, 0, 0, 0), dilation: tuple[int, int, int] = (1, 1, 1), activation_fn: Optional[ActFnType] = None, lnc_shard: bool = False) → nl.ndarray#

3D Convolution using tensor engine with K-replication strategy and W-contiguous tiling.

Parameters:

x_in (nl.ndarray) – [B, C_in, D, H, W], Input tensor on HBM.
filters (nl.ndarray) – [K_d, K_h, K_w, C_in, C_out], Filter weights on HBM.
bias (Optional[nl.ndarray]) – [C_out], Optional bias tensor on HBM.
stride (tuple[int, int, int]) – (stride_d, stride_h, stride_w), Convolution strides.
padding (tuple[int, int, int, int, int, int]) – (pad_d_left, pad_d_right, pad_h_top, pad_h_bottom, pad_w_left, pad_w_right), Padding for each spatial dimension.
dilation (tuple[int, int, int]) – (dilation_d, dilation_h, dilation_w), Dilation factors.
activation_fn (Optional[ActFnType]) – Optional activation function to apply after conv.
lnc_shard (bool) – Enable LNC sharding across neuron cores.

Returns:

[B, C_out, D_out, H_out, W_out], Output tensor on HBM.

Return type:

nl.ndarray

Dimensions:

B: Batch size
C_in: Number of input channels
C_out: Number of output channels
D: Input depth
H: Input height
W: Input width
K_d: Filter depth
K_h: Filter height
K_w: Filter width
D_out: Output depth = (D + pad_d_left + pad_d_right - dilation_d * (K_d - 1) - 1) // stride_d + 1
H_out: Output height = (H + pad_h_top + pad_h_bottom - dilation_h * (K_h - 1) - 1) // stride_h + 1

This document is relevant for: Trn2, Trn3

Conv3D Kernel API Reference

Contents

Conv3D Kernel API Reference#

Background#

API Reference#

conv3d#