Neuron Kernel Interface (NKI) Documentation#
NKI Beta Versions
NKI is currently in beta, with Beta 2 as the current version. Read more about NKI beta versions.
The Neuron Kernel Interface (NKI) is a Python-embedded Domain Specific Language (DSL) that gives developers direct access to Neuron’s Instruction Set Architecture (NISA). NKI provides the ease-of-programming offered by tile-level operations and full access to the Neuron Instruct Set Architecture within a familiar pythonic programming environment. It provides the flexibility to implement architecture-specific optimizations rapidly, at a speed difficult to achieve in higher-level DSLs and frameworks. This has enabled developers to achieve optimal performance across a wide spectrum of machine learning models on Trainium, including Transformers, Mixture-of-Experts, State Space Models, and more.
In addition to directly exposing NISA, NKI provides easy-to-use APIs for controlling instruction scheduling, memory management across the memory hierarchy, software pipelining, and other optimization techniques. The APIs are carefully designed to help simplify the code while providing more control and flexibility to developers. This gives developers fine-grained tuning optimizations that work in concert with the capabilities provided by the compiler.
NKI currently supports multiple NeuronDevice generations:
Trainium/Inferentia2, available on AWS
trn1,trn1nandinf2instancesTrainium2, available on AWS
trn2instances and UltraServersTrainium3, available on AWS
trn3instances and UltraServers
Explore the comprehensive guides below to learn how to implement and optimize your kernels for AWS Neuron accelerators: