AWS Neuron Documentation#

AWS Neuron is the software development kit for deep learning and generative AI on AWS Inferentia and AWS Trainium instances. Neuron supports multiple development paths: serving large language models with vLLM, training and inference with PyTorch and JAX, authoring custom kernels with NKI, and direct use of the Neuron Graph Compiler and Runtime.

Released May 1, 2026. Select this card for the details!

Current release: Neuron 2.29.1

Patch to Neuron 2.29.0 to address two Neuron Explorer issues.


Who Neuron is for#


Start here#

Pick the task that matches what you want to do.

Get started with a Neuron DLAMI and PyTorch

Launch a Trainium or Inferentia EC2 instance with a pre-configured Neuron Deep Learning AMI (DLAMI) and PyTorch. The DLAMI bundles the Neuron SDK, framework virtual environments (PyTorch, JAX, vLLM), and the system tools — no manual install required. See Install PyTorch via Deep Learning AMI and get started!

Serve a large language model

Run LLM inference on Trainium and Inferentia with vLLM on Neuron. Supports OpenAI-compatible APIs, continuous batching, and speculative decoding. See the offline or online serving quickstart.

Train a model with PyTorch

Use native PyTorch on Trainium (TorchNeuron) with eager mode, torch.compile, and the standard distributed APIs (FSDP, DTensor, DDP). Existing PyTorch code runs with minimal changes; primarily swap cuda for neuron on your tensors.

Write custom NKI kernels

Program NeuronCores directly with NKI when you need finer control than framework-level compilation provides. NKI offers tile-level programming with Python and NumPy-like syntax, and ships with a library of pre-optimized kernels (attention, MoE, and others).


Neuron SDK Organization#

The Neuron SDK includes:

  • Frameworks — Native PyTorch on Trainium (TorchNeuron), PyTorch NeuronX (torch-neuronx), and JAX NeuronX.

  • Serving integrations — vLLM on Neuron V1 (via the vllm-neuron plugin) and the earlier vLLM integration through NxD Inference, both for OpenAI-compatible LLM serving.

  • NeuronX Distributed (NxD) libraries — PyTorch libraries for distributed training and inference, including NxD Training, NxD Inference, and NxD Core.

  • Neuron Kernel Interface (NKI) — Python programming interface for custom kernels on NeuronCores, plus the NKI Library of pre-optimized kernels.

  • Neuron Graph Compiler (neuronx-cc) — Compiles model graphs and NKI kernels into Neuron Executable File Format (NEFF) files.

  • Neuron Runtime — Loads NEFFs and executes them on NeuronCores, handling device allocation, memory management, and collective communications.

  • Developer tools — Neuron Explorer and the Neuron system tools for profiling and debugging across every component.

Frameworks and serving

Write training and inference code with PyTorch or JAX. Serve LLMs with vLLM on Neuron.

NKI — Neuron Kernel Interface

Programming interface for custom kernels on NeuronCores. Used by the modern framework and serving integrations. Ships with a library of pre-optimized kernels.

NeuronX Distributed (NxD) libraries

PyTorch libraries for distributed training and inference on Neuron. Provide reference model implementations, sharding strategies (tensor, expert, context, pipeline parallelism), and distributed checkpointing. NxD Inference integrates selected NKI kernels for performance-critical operations.

Neuron Graph Compiler and Runtime

The compiler (neuronx-cc) transforms model graphs into NEFF files. The runtime loads NEFFs and executes them on NeuronCores, handling device allocation, memory management, and collective communications. Both framework graphs and NKI kernels compile to NEFF.


Deployment and Tools Support#

Neuron Explorer

Profiling and optimization tool with support for framework, NKI, compiler, and runtime workloads. Covers every Neuron SDK component area.

Neuron Agentic Development

Open-source AI agents and skills for NKI kernel authoring, debugging, profiling, and analysis. Runs inside Claude Code, Kiro, and other agentic IDEs.

Deploy on AWS

Pre-configured DLAMIs and DLCs for EC2, EKS, ECS, SageMaker, and ParallelCluster.


Learn more#

What is AWS Neuron?

Background on Inferentia, Trainium, and the Neuron SDK.

Release notes

Component-by-component release notes for every Neuron SDK version.

Open source and contribute

Public GitHub repositories, contribution guidelines, and source for TorchNeuron, NKI Library, NKI Samples, vLLM Neuron, and Neuron Agentic Development.

News and blogs

Feature announcements, technical deep dives, and customer stories.

FAQ and troubleshooting

Common questions and solutions for Neuron SDK issues.

Archived documentation

Reference material for MXNet Neuron, TensorFlow Neuron, torch-neuron (Inf1), and other legacy components.

AWS and the AWS logo are trademarks of Amazon Web Services, Inc. or its affiliates. All rights reserved.