NKI Tutorials
This section provides hands-on tutorials for the Neuron Kernel Interface (NKI), demonstrating how to write custom kernels for AWS Trainium and Inferentia instances. These tutorials cover fundamental operations, advanced techniques, and distributed computing patterns using NKI.
The full source code of the following tutorials can be also viewed on the
nki-samples repository on GitHub.
Basic Operations
Learn the fundamentals of implementing matrix multiplication kernels in NKI
Implement efficient 2D matrix transpose operations using NKI
Create custom 2D average pooling kernels for computer vision workloads
Normalization Techniques
Implement layer normalization kernels for transformer models
Build RMS normalization kernels for modern neural network architectures
Advanced Kernels
Advanced tutorial on implementing fused self-attention mechanisms
Implement fused Mamba state space model kernels
Distributed Computing
Single Program Multiple Data tensor addition across multiple cores
Advanced SPMD tensor operations across multiple NeuronCores