This document is relevant for: Trn2, Trn3

Conv3D Kernel API Reference#

3D Convolution using tensor engine with K-replication strategy and W-contiguous tiling.

Implements a 3D convolution operation (x_in * filters + bias) optimized for NeuronCore using a K-replication strategy for filter loading and W-contiguous tiling for output computation. Supports configurable stride, padding, dilation, optional bias, optional activation function, and LNC sharding. Intended Usage Range: B: 1-128 C_in: 3-1280, C_out: 3-2048 D: 1-1024, H: 1-1024, W: 1-1024 K_d: 1-64, K_h: 1-64, K_w: 1-64 Stride: 1-64 per dimension Dilation: 1-64 per dimension

Background#

The conv3d kernel implements 3D convolution using the tensor engine with a K-replication strategy for filter loading and W-contiguous tiling for output computation. It supports configurable stride, padding, dilation, optional bias, activation fusion, and LNC sharding.

API Reference#

Source code for this kernel API can be found at: conv3d.py

conv3d#

nkilib.experimental.conv.conv3d(x_in: nl.ndarray, filters: nl.ndarray, bias: Optional[nl.ndarray] = None, stride: tuple[int, int, int] = (1, 1, 1), padding: tuple[int, int, int, int, int, int] = (0, 0, 0, 0, 0, 0), dilation: tuple[int, int, int] = (1, 1, 1), activation_fn: Optional[ActFnType] = None, lnc_shard: bool = False) nl.ndarray#

3D Convolution using tensor engine with K-replication strategy and W-contiguous tiling.

Parameters:
  • x_in (nl.ndarray) – [B, C_in, D, H, W], Input tensor on HBM.

  • filters (nl.ndarray) – [K_d, K_h, K_w, C_in, C_out], Filter weights on HBM.

  • bias (Optional[nl.ndarray]) – [C_out], Optional bias tensor on HBM.

  • stride (tuple[int, int, int]) – (stride_d, stride_h, stride_w), Convolution strides.

  • padding (tuple[int, int, int, int, int, int]) – (pad_d_left, pad_d_right, pad_h_top, pad_h_bottom, pad_w_left, pad_w_right), Padding for each spatial dimension.

  • dilation (tuple[int, int, int]) – (dilation_d, dilation_h, dilation_w), Dilation factors.

  • activation_fn (Optional[ActFnType]) – Optional activation function to apply after conv.

  • lnc_shard (bool) – Enable LNC sharding across neuron cores.

Returns:

[B, C_out, D_out, H_out, W_out], Output tensor on HBM.

Return type:

nl.ndarray

Dimensions:

  • B: Batch size

  • C_in: Number of input channels

  • C_out: Number of output channels

  • D: Input depth

  • H: Input height

  • W: Input width

  • K_d: Filter depth

  • K_h: Filter height

  • K_w: Filter width

  • D_out: Output depth = (D + pad_d_left + pad_d_right - dilation_d * (K_d - 1) - 1) // stride_d + 1

  • H_out: Output height = (H + pad_h_top + pad_h_bottom - dilation_h * (K_h - 1) - 1) // stride_h + 1

This document is relevant for: Trn2, Trn3