AWS Neuron Documentation#

AWS Neuron is a software stack that enables high-performance deep learning and generative AI workloads on AWS Inferentia and AWS Trainium instances. Neuron provides a complete machine learning development experience with compiler optimization, runtime efficiency, and comprehensive tooling.

For more details, see What is AWS Neuron? and What’s New in AWS Neuron?
For the latest release notes, see AWS Neuron Release Notes. The current release is version 2.28.1, released on March 13, 2026.

Looking to dive into Neuron development? Follow these links:

Learn about Neuron’s support for native PyTorch
Get started with vLLM for Offline or Online inference model serving
Implement and run your first NKI kernel
Optimize model performance with Neuron Explorer
Launch a Inf/Trn instance on Amazon EC2
Deploy a DLC

Learn more about AWS Neuron#

Select a card below to read more about these features:

Native PyTorch

Learn about native PyTorch support in AWS Neuron.

vLLM on Neuron

High-performance inference serving for large language models with OpenAI-compatible APIs on Trainium and Inferentia.

Developer Tools

Profile and monitor your models as you develop, build, test, and deploy them with Neuron’s developer tools.

Neuron Kernel Interface

Low-level programming interface for custom kernel development on Trainium and Inferentia with direct hardware access.

Other Neuron features:

Orchestration and Deployment on AWS EC2 and EKS

Configure and run AWS Deep Learning Images (DLAMIs) and Containers (DLCs) to test and deploy your models with AWS EC2 and EKS.

AWS Neuron Open Source

Interested in contributing to Neuron source code and samples? Review this documentation and learn about our public GitHub repos and how to contribute to the code and samples in them.

AWS Neuron-supported ML frameworks

NeuronX Distributed (NxD) libraries

Workloads

Runtime & Collectives

Compilers

Legacy Documentation and Samples

AWS Neuron Documentation

Contents

AWS Neuron Documentation#

Learn more about AWS Neuron#