NKI Tutorials#

This section provides hands-on tutorials for the Neuron Kernel Interface (NKI), demonstrating how to write custom kernels for AWS Trainium and Inferentia instances. These tutorials cover fundamental operations, advanced techniques, and distributed computing patterns using NKI.

The full source code of the following tutorials can be also viewed on the nki-samples repository on GitHub.

Basic Operations#

Matrix Multiplication

Learn the fundamentals of implementing matrix multiplication kernels in NKI

2D Transpose

Implement efficient 2D matrix transpose operations using NKI

Average Pooling 2D

Create custom 2D average pooling kernels for computer vision workloads

Normalization Techniques#

Layer Normalization

Implement layer normalization kernels for transformer models

RMS Normalization

Build RMS normalization kernels for modern neural network architectures

Advanced Kernels#

Fused Self-Attention

Advanced tutorial on implementing fused self-attention mechanisms

Fused Mamba

Implement fused Mamba state space model kernels

Distributed Computing#

SPMD Tensor Addition

Single Program Multiple Data tensor addition across multiple cores

Multi-Core SPMD Addition

Advanced SPMD tensor operations across multiple NeuronCores