This document is relevant for: Trn2, Trn3

NKI FAQ#

When should I use NKI?#

NKI lets you write custom operators that program directly against the Neuron ISA. There are two common reasons to use NKI:

Performance optimization: When the Neuron Compiler’s general-purpose optimizations don’t fully exploit the hardware for your specific workload, NKI lets you write hand-tuned operators that maximize compute and memory throughput. For example, the NKI Library provides optimized kernels for attention, MLP, RMSNorm with quantization, and collective communication that outperform compiler-generated equivalents.
Novel operators and architectures: NKI enables you to implement operators that are not yet supported by the Neuron Compiler, letting you self-serve new deep learning architectures and custom operations without waiting for compiler support.

Which AWS chips does NKI support?#

NKI supports Trainium2 and Trainium3 chips, available in the following instance types: Trn2 and Trn3.

Which compute engines are supported?#

The following AWS Trainium and Inferentia compute engines are supported: Tensor Engine, Vector Engine, Scalar Engine, and GpSimd Engine. For more details, see the NeuronDevice Architecture Guide, and refer to nki.isa APIs to identify which engines are utilized for each instruction.

How do I launch a NKI kernel onto a logical NeuronCore with Trainium2 or Trainium3 from NKI?#

A logical NeuronCore (LNC) can consist of multiple physical NeuronCores. In the current Neuron release, an LNC on Trainium2 or Trainium3 can have up to two physical NeuronCores.

For more details on NeuronCore configurations, see Logical NeuronCore configurations.

In NKI, users can launch a NKI kernel onto multiple physical NeuronCores within a logical NeuronCore using kernel[2] to set LNC=2. Each core receives a different nl.program_id(0) value (0 or 1).

What ML Frameworks support NKI kernels?#

NKI is integrated with PyTorch and JAX frameworks. For more details, see the NKI Kernel as a Framework Custom Operator.

What Neuron software does not currently support NKI?#

NKI does not currently support integration with Neuron Custom C++ Operators, Transformers NeuronX, and Neuron Collective Communication.

Where can I find NKI sample kernels?#

NKI provides two open source repositories with kernel examples:

NKI Library — Production-ready, optimized kernels for common operations (matrix multiplication, attention, normalization, quantization, etc.) that you can use directly in your models. See the NKI Library documentation for API reference and design specifications.
NKI Samples — Reference and tutorial kernels that demonstrate NKI programming patterns and concepts. These are designed for learning and experimentation rather than production use.

For step-by-step guides on writing NKI kernels, see the NKI tutorials.

What should I do if I have trouble resolving a kernel compilation error?#

Refer to the NKI sample GitHub issues for guidance on resolving common NKI compilation errors.

If you encounter compilation errors from Neuron Compiler that you cannot understand or resolve, you may check out NKI sample GitHub issues and open an issue if no similar issues exist.

How can I debug numerical issues in NKI kernels?#

We encourage NKI programmers to build kernels incrementally and verify output of small operators one at a time. NKI also provides a CPU simulation mode that supports printing of kernel intermediate tensor values to the console. See nki.simulate for a code example.

How can I optimize my NKI kernel?#

To learn how to optimize your NKI kernel, see the NKI Performance Optimizations.

Does NKI support entire Neuron instruction set?#

Neuron will iteratively add support for the Neuron instruction set through adding more nki.isa (Instruction Set Architecture) APIs in upcoming Neuron releases.

Will NKI APIs guarantee backwards compatibility?#

The NKI APIs follow the Neuron Software Maintenance policy for Neuron APIs. For more information, see the SDK Maintenance Policy.