AWS Neuron Documentation#

AWS Neuron is the software development kit for deep learning and generative AI on AWS Inferentia and AWS Trainium instances. Neuron supports multiple development paths: serving large language models with vLLM, training and inference with PyTorch and JAX, authoring custom kernels with NKI, and direct use of the Neuron Graph Compiler and Runtime.

Current release: Neuron 2.31.0

Released July 7, 2026. Select this card for the details!

Who Neuron is for#

ML engineers deploying production models — Deploy prepared Neuron Deep Learning AMIs (DLAMIs) and Deep Learning Containers (DLCs) on Amazon EC2 Trainium and Inferentia instances. Start with the DLAMI setup guide or the DLC quickstart.
- Serving LLMs — Use vLLM on Neuron to serve open-source LLMs with minimal code changes. Start with the online serving quickstart or offline serving quickstart.
ML researchers and model developers — Use native PyTorch on Trainium with eager mode, torch.compile, and standard distributed APIs. Start with Native PyTorch on Neuron.
Performance engineers optimizing kernels — Use NKI to write custom kernels with direct NeuronCore access, or pick from the NKI Library’s pre-optimized kernels. Start with the NKI quickstart and NKI Library.

Start here#

Pick the task that matches what you want to do.

Get started with a Neuron DLAMI and PyTorch

Launch a Trainium or Inferentia EC2 instance with a pre-configured Neuron Deep Learning AMI (DLAMI) and PyTorch. The DLAMI bundles the Neuron SDK, framework virtual environments (PyTorch, JAX, vLLM), and the system tools — no manual install required. See Install PyTorch via Deep Learning AMI and get started!

Serve a large language model

Run LLM inference on Trainium and Inferentia with vLLM on Neuron. Supports OpenAI-compatible APIs, continuous batching, and speculative decoding. See the offline or online serving quickstart.

Train a model with PyTorch

Use native PyTorch on Trainium (TorchNeuron) with eager mode, torch.compile, and the standard distributed APIs (FSDP, DTensor, DDP). Existing PyTorch code runs with minimal changes; primarily swap cuda for neuron on your tensors.

Write custom NKI kernels

Program NeuronCores directly with NKI when you need finer control than framework-level compilation provides. NKI offers tile-level programming with Python and NumPy-like syntax, and ships with a library of pre-optimized kernels (attention, MoE, and others).

Neuron SDK Organization#

The Neuron SDK includes:

Frameworks — Native PyTorch on Trainium (TorchNeuron), PyTorch NeuronX (torch-neuronx), and JAX NeuronX.
Serving integrations — vLLM on Neuron V1 (via the vllm-neuron plugin) and the earlier vLLM integration through NxD Inference, both for OpenAI-compatible LLM serving.
NeuronX Distributed (NxD) libraries — PyTorch libraries for distributed training and inference, including NxD Training, NxD Inference, and NxD Core.
Neuron Kernel Interface (NKI) — Python programming interface for custom kernels on NeuronCores, plus the NKI Library of pre-optimized kernels.
Neuron Graph Compiler (neuronx-cc) — Compiles model graphs and NKI kernels into Neuron Executable File Format (NEFF) files.
Neuron Runtime — Loads NEFFs and executes them on NeuronCores, handling device allocation, memory management, and collective communications.
Developer tools — Neuron Explorer and the Neuron system tools for profiling and debugging across every component.

Frameworks and serving

Write training and inference code with PyTorch or JAX. Serve LLMs with vLLM on Neuron.

NKI — Neuron Kernel Interface

Programming interface for custom kernels on NeuronCores. Used by the modern framework and serving integrations. Ships with a library of pre-optimized kernels.

NeuronX Distributed (NxD) libraries

PyTorch libraries for distributed training and inference on Neuron. Provide reference model implementations, sharding strategies (tensor, expert, context, pipeline parallelism), and distributed checkpointing. NxD Inference integrates selected NKI kernels for performance-critical operations.

Neuron Graph Compiler and Runtime

The compiler (neuronx-cc) transforms model graphs into NEFF files. The runtime loads NEFFs and executes them on NeuronCores, handling device allocation, memory management, and collective communications. Both framework graphs and NKI kernels compile to NEFF.

Deployment and Tools Support#

Neuron Explorer

Profiling and optimization tool with support for framework, NKI, compiler, and runtime workloads. Covers every Neuron SDK component area.

Neuron Agentic Development

Open-source AI agents and skills for NKI kernel authoring, debugging, profiling, and analysis. Runs inside Claude Code, Kiro, and other agentic IDEs.

Deploy on AWS

Pre-configured DLAMIs and DLCs for EC2, EKS, ECS, SageMaker, and ParallelCluster.

Learn more#

What is AWS Neuron?

Background on Inferentia, Trainium, and the Neuron SDK.

Release notes

Component-by-component release notes for every Neuron SDK version.

Open source and contribute

Public GitHub repositories, contribution guidelines, and source for TorchNeuron, NKI Library, NKI Samples, vLLM Neuron, and Neuron Agentic Development.

News and blogs

Feature announcements, technical deep dives, and customer stories.

FAQ and troubleshooting

Common questions and solutions for Neuron SDK issues.

Archived documentation

Reference material for MXNet Neuron, TensorFlow Neuron, torch-neuron (Inf1), and other legacy components.

AWS Neuron Documentation

Contents