AWS Neuron Documentation#
AWS Neuron is the software development kit for deep learning and generative AI on AWS Inferentia and AWS Trainium instances. Neuron supports multiple development paths: serving large language models with vLLM, training and inference with PyTorch and JAX, authoring custom kernels with NKI, and direct use of the Neuron Graph Compiler and Runtime.
Who Neuron is for#
ML engineers deploying production models — Deploy prepared Neuron Deep Learning AMIs (DLAMIs) and Deep Learning Containers (DLCs) on Amazon EC2 Trainium and Inferentia instances. Start with the DLAMI setup guide or the DLC quickstart.
Serving LLMs — Use vLLM on Neuron to serve open-source LLMs with minimal code changes. Start with the online serving quickstart or offline serving quickstart.
ML researchers and model developers — Use native PyTorch on Trainium with eager mode,
torch.compile, and standard distributed APIs. Start with Native PyTorch on Neuron.Performance engineers optimizing kernels — Use NKI to write custom kernels with direct NeuronCore access, or pick from the NKI Library’s pre-optimized kernels. Start with the NKI quickstart and NKI Library.
Start here#
Pick the task that matches what you want to do.
Launch a Trainium or Inferentia EC2 instance with a pre-configured Neuron Deep Learning AMI (DLAMI) and PyTorch. The DLAMI bundles the Neuron SDK, framework virtual environments (PyTorch, JAX, vLLM), and the system tools — no manual install required. See Install PyTorch via Deep Learning AMI and get started!
Neuron SDK Organization#
The Neuron SDK includes:
Frameworks — Native PyTorch on Trainium (TorchNeuron), PyTorch NeuronX (
torch-neuronx), and JAX NeuronX.Serving integrations — vLLM on Neuron V1 (via the
vllm-neuronplugin) and the earlier vLLM integration through NxD Inference, both for OpenAI-compatible LLM serving.NeuronX Distributed (NxD) libraries — PyTorch libraries for distributed training and inference, including NxD Training, NxD Inference, and NxD Core.
Neuron Kernel Interface (NKI) — Python programming interface for custom kernels on NeuronCores, plus the NKI Library of pre-optimized kernels.
Neuron Graph Compiler (
neuronx-cc) — Compiles model graphs and NKI kernels into Neuron Executable File Format (NEFF) files.Neuron Runtime — Loads NEFFs and executes them on NeuronCores, handling device allocation, memory management, and collective communications.
Developer tools — Neuron Explorer and the Neuron system tools for profiling and debugging across every component.
Frameworks and serving
Write training and inference code with PyTorch or JAX. Serve LLMs with vLLM on Neuron.
NKI — Neuron Kernel Interface
Programming interface for custom kernels on NeuronCores. Used by the modern framework and serving integrations. Ships with a library of pre-optimized kernels.
NeuronX Distributed (NxD) libraries
PyTorch libraries for distributed training and inference on Neuron. Provide reference model implementations, sharding strategies (tensor, expert, context, pipeline parallelism), and distributed checkpointing. NxD Inference integrates selected NKI kernels for performance-critical operations.
Neuron Graph Compiler and Runtime
The compiler (neuronx-cc) transforms model graphs into NEFF files. The runtime loads NEFFs and executes them on NeuronCores, handling device allocation, memory management, and collective communications. Both framework graphs and NKI kernels compile to NEFF.
Deployment and Tools Support#
Learn more#
AWS and the AWS logo are trademarks of Amazon Web Services, Inc. or its affiliates. All rights reserved.