NKI Library Supported Kernel Reference#

The NKI Library provides pre-built reference kernels you can use directly in your model development with the AWS Neuron SDK and NKI. These kernels provide the default classes, functions, and parameters you can use to integrate the NKI Library kernels into your models.

Source code for these kernel APIs can be found at: aws-neuron/nki-library

Core Kernels#

Normalization and Quantization Kernels#

RMSNorm-Quant

Performs optional RMS normalization followed by quantization to fp8.

QKV Projection Kernels#

QKV	Performs Query-Key-Value projection with optional normalization and RoPE fusion.

Attention Kernels#

Attention CTE	Implements attention optimized for Context Encoding (prefill) use cases.
Attention TKG	Implements attention optimized for Token Generation (decode) use cases with small active sequence lengths.

Rotary Position Embedding (RoPE) Kernels#

RoPE	Applies Rotary Position Embedding to input embeddings with flexible layout support.

Multi-Layer Perceptron (MLP) Kernels#

MLP	Implements Multi-Layer Perceptron with optional normalization fusion and quantization support.

Output Projection Kernels#

Output Projection CTE	Computes output projection optimized for Context Encoding use cases.
Output Projection TKG	Computes output projection optimized for Token Generation use cases.

Mixture of Experts (MoE) Kernels#

Router Top-K	Computes router logits, applies activation functions, and performs top-K selection for MoE models.
MoE CTE	Implements Mixture of Experts MLP operations optimized for Context Encoding use cases.
MoE TKG	Implements Mixture of Experts MLP operations optimized for Token Generation use cases.

Cumulative Sum Kernels#

Cumsum

Computes cumulative sum along the last dimension with optimized tiling.

Experimental Kernels#

Note

Experimental kernels are under active development and their APIs may change in future releases.

Attention Kernels#

Attention Block TKG

Fused attention block for Token Generation that keeps all intermediate tensors in SBUF to minimize HBM traffic.

Convolution Kernels#

Depthwise Conv1D

Implements depthwise 1D convolution using implicit GEMM algorithm.

Loss Kernels#

Cross Entropy

Memory-efficient cross entropy loss forward and backward passes using online log-sum-exp algorithm.

MoE Backward Kernels#

Blockwise MM Backward

Computes backward pass for blockwise matrix multiplication in Mixture of Experts layers.

NKI Library Supported Kernel Reference

Contents

NKI Library Supported Kernel Reference#

Core Kernels#

Normalization and Quantization Kernels#

QKV Projection Kernels#

Attention Kernels#

Rotary Position Embedding (RoPE) Kernels#

Multi-Layer Perceptron (MLP) Kernels#

Output Projection Kernels#

Mixture of Experts (MoE) Kernels#

Cumulative Sum Kernels#

Experimental Kernels#

Attention Kernels#

Convolution Kernels#

Loss Kernels#

MoE Backward Kernels#