This document is relevant for: Inf1, Inf2, Trn1, Trn2

What’s New#

Neuron 2.21.1 (01/14/2025)#

Neuron 2.21.1 release pins Transformers NeuronX dependency to transformers<4.48 and fixes DMA abort errors on Trn2.

Additionally, this release addresses NxD Core and Training improvements, including fixes for sequence parallel support in quantized models and a new flag for dtype control in Llama3/3.1 70B configurations. See NxD Training Release Notes (neuronx-distributed-training) for details.

NxD Inference update includes minor bug fixes for sampling parameters. See NxD Inference Release Notes.

Neuron supported DLAMIs and DLCs have been updated to Neuron 2.21.1 SDK. Users should be aware of an incompatibility between Tensorflow-Neuron 2.10 (Inf1) and Neuron Runtime 2.21 in DLAMIs, which will be addressed in the next minor release. See Neuron DLAMI Release Notes.

The Neuron Compiler includes bug fixes and performance enhancements specifically targeting the Trn2 platform.

Neuron 2.21.0 (12/20/2024)#

What’s New#

Overview: Neuron 2.21.0 introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer (Preview). The release adds new capabilities in both training and inference of large-scale models. It introduces NxD Inference (beta), a PyTorch-based library for deployment, Neuron Profiler 2.0 (beta), and PyTorch 2.5 support across the Neuron SDK, and Logical NeuronCore Configuration (LNC) for optimizing NeuronCore allocation. The release enables Llama 3.1 405B model inference on a single trn2.48xlarge instance.

NxD Inference: NxD Inference (beta) is a new PyTorch-based inference library for deploying large-scale models on AWS Inferentia and Trainium instances. It enables PyTorch model onboarding with minimal code changes and integrates with vLLM. NxDI supports various model architectures, including Llama versions for text processing (Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 3.3), Llama 3.2 multimodal for multimodal tasks, and Mixture-of-Experts (MoE) model architectures including Mixtral and DBRX. The library supports quantization methods, includes dynamic sampling, and is compatible with HuggingFace checkpoints and generate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The release includes Llama 3.1 405B model sample and Llama 3.3 70B model sample for inference on a single trn2.48xlarge instance.

For more information, see NxD Inference documentation and check the NxD Inference Github repository: aws-neuron/neuronx-distributed-inference

Transformers NeuronX (TNx): This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds Eagle speculative decoding with greedy and lossless sampling, as well as support for CPU compilation and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.

Training Highlights: NxD Training in this release adds support for HuggingFace Llama3/3.1 70B on trn2 instances, introduces DPO support for post-training model alignment, and adds support for Mixture-of-Experts (MoE) models including Mixtral 7B. The release includes improved checkpoint conversion capabilities and supports MoE with Tensor, Sequence, Pipeline, and Expert parallelism.

ML Frameworks: Neuron 2.21.0 adds support for PyTorch 2.5 and JAX 0.4.35.

Note

The CVEs CVE-2024-31583 and CVE-2024-31580 affect PyTorch versions 2.1 and earlier. Based on Amazon’s analysis, executing models on Trainium and Inferentia is not exposed to either of these vulnerabilities. We recommend upgrading to the new version of Torch-NeuronX by following the Neuron setup instructions.

Logical NeuronCore Configuration (LNC): This release introduces LNC for Trainium2 instances, optimizing NeuronCore allocation for ML applications. LNC offers two configurations: default (LNC=2) combining two physical cores, and alternative (LNC=1) mapping each physical core individually. This feature allows users to efficiently manage resources for large-scale model training and deployment through runtime variables and compiler flags.

Neuron Profiler 2.0: The new profiler provides system and device-level profiling, timeline annotations, container integration, and support for distributed workloads. It includes trace export capabilities for Perfetto visualization and integration with JAX and PyTorch profilers, and support for Logical NeuronCore Configuration (LNC).

Neuron Kernel Interface (NKI): NKI now supports Trainium2 including Logical NeuronCore Configuration (LNC), adds SPMD capabilities for multi-core operations, and includes new modules and APIs including support for float8_e5m2 datatype.

Deep Learning Containers (DLAMIs): This release expands support for JAX 0.4 within the Multi Framework DLAMI. It also introduces NxD Training, NxD Inference, and NxD Core with PyTorch 2.5 support. Additionally, a new Single Framework DLAMI for TensorFlow 2.10 on Ubuntu 22 is now available.

Deep Learning Containers (DLCs): This release introduces new DLCs for JAX 0.4 training and PyTorch 2.5.1 inference and training. All DLCs have been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC now supports both NxD Inference and TNx libraries.

Documentation: Documentation updates include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.

Software Maintenance: This release includes the following announcements:

  • Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release

  • Announcing end of support for Neuron DET tool starting next release

  • PyTorch Neuron versions 1.9 and 1.10 no longer supported

  • Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release

  • Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release

  • Announcing end of support for Python 3.8 in future releases

  • Announcing end of support for Ubuntu20 DLCs and DLAMIs

Amazon Q: Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n, Trn2

NxD Core (neuronx-distributed)

Training:

Trn1/Trn1n,Trn2

NxD Inference (neuronx-distributed-inference)

Inf2, Trn1/Trn1n,Trn2

NxD Training (neuronx-distributed-training)

  • Added support for HuggingFace Llama3/3.1 70B with Trn2 instances

  • Added support for Mixtral 8x7B Megatron and HuggingFace models

  • Added support for custom pipeline parallel cuts in HuggingFace Llama3

  • Added support for DPO post-training model alignment

  • See more at NxD Training Release Notes (neuronx-distributed-training)

Trn1/Trn1n,Trn2

PyTorch NeuronX (torch-neuronx)

Trn1/Trn1n,Inf2,Trn2

NeuronX Nemo Megatron for Training

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Trn1/Trn1n,Inf2,Trn2

Neuron Kernel Interface (NKI)

  • Added api/nki.compiler module with Allocation Control and Kernel decorators

  • Added new nki.isa APIs. See api/nki.isa

  • Added new nki.language APIs. See api/nki.language

  • Added new kernels (allocated_fused_self_attn_for_SD_small_head_size, allocated_fused_rms_norm_qkv). See api/nki.kernels

  • See more at Neuron Kernel Interface (NKI) release notes

Trn1/Trn1n,Inf2

Neuron Deep Learning AMIs (DLAMIs)

  • Added support for Trainium2 chips within the Neuron Multi Framework DLAMI.

  • Added support for JAX 0.4 to Neuron Multi Framework DLAMI.

  • Added NxD Training (NxDT), NxD Inference (NxDI) and NxD Core PyTorch 2.5 support within the Neuron Multi Framework DLAMI.

  • See more at Neuron DLAMI User Guide

Inf1,Inf2,Trn1/Trn1n

Neuron Deep Learning Containers (DLCs)

  • Added new pytorch-inference-neuronx 2.5.1 and pytorch-training-neuronx 2.5.1 DLCs

  • Added new jax-training-neuronx 0.4 Training DLC

  • See more at Neuron DLC Release Notes

Inf1,Inf2,Trn1/Trn1n

Neuron Tools

Inf1,Inf2,Trn1/Trn1n,Trn2

Neuron Runtime

  • Added runtime support to fail in case of out-of-bound memory access when DGE is enabled.

  • Added support for 4-rank replica group on adjacent Neuron cores on TRN1/TRN1N

  • See more at Neuron Runtime Release Notes

Inf1,Inf2,Trn1/Trn1n,Trn2

Release Annoucements

Inf1, Inf2, Trn1/Trn1n

Documentation Updates

Inf1, Inf2, Trn1/Trn1n, Trn2

Minor enhancements and bug fixes.

  • See components-rn

Trn1/Trn1n , Inf2, Inf1, Trn2

Release Artifacts

Trn1/Trn1n , Inf2, Inf1, Trn2

2.21.0 Known Issues and Limitations#

  • See component release notes below for any additional known issues.

Neuron 2.21.0 Beta (12/03/2024)#

Note

This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).

For access to this release (Neuron 2.21 Beta), please contact your account manager.

This release (Neuron 2.21 beta) introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.

NxD Inference, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for AXLearn training for JAX models.

The new Neuron Profiler 2.0 introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.

The documentation has been updated to include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.

Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.

Note

For the latest release that supports Trn1, Inf2 and Inf1 instances, please see Neuron Release 2.20.2

Release Artifacts#

Trn2 packages#

List of packages in Neuron 2.21.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.23.135.0 
Driver                              aws-neuronx-dkms-2.19.64.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.13.16.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.13.2.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.23.45.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.23.45.0 
OCI                                 aws-neuronx-oci-hook-2.6.36.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.23.112.0 
System Tools                        aws-neuronx-tools-2.20.204.0 
Compiler                            neuronx-cc-2.16.372.0 
Compiler                            neuronx-cc-stubs-2.16.372.0 
Neuron Distributed                  neuronx_distributed-0.10.1 
Neuron Distributed Inference        neuronx_distributed_inference-0.1.1 
PyTorch                             torch-neuronx-2.1.2.2.4.0 
PyTorch                             torch-neuronx-2.5.1.2.4.0 
PyTorch                             torch_xla-2.1.6

Trn1 packages#

List of packages in Neuron 2.21.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.23.135.0 
Driver                              aws-neuronx-dkms-2.19.64.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.13.16.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.13.2.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.23.45.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.23.45.0 
OCI                                 aws-neuronx-oci-hook-2.6.36.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.23.112.0 
System Tools                        aws-neuronx-tools-2.20.204.0 
Jax                                 jax_neuronx-0.1.2 
Framework                           libneuronxla-2.1.714.0 
Framework                           libneuronxla-0.5.3396 
Compiler                            neuronx-cc-2.16.372.0 
Compiler                            neuronx-cc-stubs-2.16.372.0 
Neuron Distributed                  neuronx_distributed-0.10.1 
Neuron Distributed Training         neuronx_distributed_training-1.1.1 
TensorBoard                         tensorboard-plugin-neuronx-2.6.52.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.17.0 
PyTorch                             torch-neuronx-2.1.2.2.4.0 
PyTorch                             torch-neuronx-2.5.1.2.4.0 
PyTorch                             torch_xla-1.13.1+torchneurong 
PyTorch                             torch_xla-2.1.6 
Transformers Neuron                 transformers-neuronx-0.13.380

Inf2 packages#

List of packages in Neuron 2.21.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.23.135.0 
Driver                              aws-neuronx-dkms-2.19.64.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.13.16.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.13.2.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.23.45.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.23.45.0 
OCI                                 aws-neuronx-oci-hook-2.6.36.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.23.112.0 
System Tools                        aws-neuronx-tools-2.20.204.0 
Jax                                 jax_neuronx-0.1.2 
Framework                           libneuronxla-2.1.714.0 
Framework                           libneuronxla-0.5.3396 
Compiler                            neuronx-cc-2.16.372.0 
Compiler                            neuronx-cc-stubs-2.16.372.0 
Neuron Distributed                  neuronx_distributed-0.10.1 
Neuron Distributed Inference        neuronx_distributed_inference-0.1.1 
TensorBoard                         tensorboard-plugin-neuronx-2.6.52.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.17.0 
PyTorch                             torch-neuronx-2.1.2.2.4.0 
PyTorch                             torch-neuronx-2.5.1.2.4.0 
PyTorch                             torch_xla-1.13.1+torchneurong 
PyTorch                             torch_xla-2.1.6 
Transformers Neuron                 transformers-neuronx-0.13.380

Inf1 packages#

List of packages in Neuron 2.21.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.12.35.0 
Driver                              aws-neuronx-dkms-2.19.64.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.23.45.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.23.45.0 
OCI                                 aws-neuronx-oci-hook-2.6.36.0 
Runtime Library                     aws-neuronx-runtime-lib-2.12.23.0 
System Tools                        aws-neuronx-tools-2.20.204.0 
Compiler                            dmlc_nnvm-1.19.6.0 
Compiler                            dmlc_topi-1.19.6.0 
Compiler                            dmlc_tvm-1.19.6.0 
Compiler                            inferentia_hwm-1.17.6.0 
MXNet                               mx_neuron-1.8.0.2.4.147.0 
MXNet                               mxnet_neuron-1.5.1.1.10.0.0 
Compiler                            neuron-cc-1.24.0.0 
Perf Tools                          neuronperf-1.8.93.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuron-2.10.1.2.12.2.0 
TensorFlow                          tensorflow-neuron-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuron-2.9.3.2.12.2.0 
PyTorch                             torch-neuron-1.10.2.2.11.13.0 
PyTorch                             torch-neuron-1.11.0.2.11.13.0 
PyTorch                             torch-neuron-1.12.1.2.11.13.0 
PyTorch                             torch-neuron-1.13.1.2.11.13.0

Supported Python Versions for Inf1 packages#

List of packages in Neuron 2.21.1:

Package                                        Supported Python Versions              
dmlc_nnvm-1.19.6.0                                3.8, 3.9, 3.10 
dmlc_topi-1.19.6.0                                3.8, 3.9, 3.10 
dmlc_tvm-1.19.6.0                                 3.8, 3.9, 3.10 
inferentia_hwm-1.17.6.0                           3.8, 3.9, 3.10 
mx_neuron-1.8.0.2.4.147.0                         3.8, 3.9, 3.10 
mxnet_neuron-1.5.1.1.10.0.0                       3.8, 3.9, 3.10 
neuron-cc-1.24.0.0                                3.8, 3.9, 3.10 
neuronperf-1.8.93.0                               3.8, 3.9, 3.10 
tensorflow-neuron-2.10.1.2.12.2.0                 3.8, 3.9, 3.10 
tensorflow-neuron-2.8.4.2.12.2.0                  3.8, 3.9, 3.10 
tensorflow-neuron-2.9.3.2.12.2.0                  3.8, 3.9, 3.10 
torch-neuron-1.10.2.2.11.13.0                     3.8, 3.9, 3.10 
torch-neuron-1.11.0.2.11.13.0                     3.8, 3.9, 3.10 
torch-neuron-1.12.1.2.11.13.0                     3.8, 3.9, 3.10 
torch-neuron-1.13.1.2.11.13.0                     3.8, 3.9, 3.10

Supported Python Versions for Inf2/Trn1/Trn2 packages#

List of packages in Neuron 2.21.1:

Package                                        Supported Python Versions              
aws-neuronx-runtime-discovery-2.9                 3.8, 3.9, 3.10, 3.11 
jax_neuronx-0.1.2                                 3.9 
libneuronxla-2.1.714.0                            3.8, 3.9, 3.10, 3.11 
libneuronxla-0.5.3396                             3.8, 3.9, 3.10 
neuronx-cc-2.16.372.0                             3.8, 3.9, 3.10, 3.11 
neuronx-cc-stubs-2.16.372.0                       3.8, 3.9, 3.10 
neuronx_distributed-0.10.1                        3.8, 3.9, 3.10, 3.11 
neuronx_distributed_inference-0.1.1               3.8, 3.9, 3.10, 3.11 
tensorflow-neuronx-2.10.1.2.1.0                   3.8, 3.9, 3.10 
tensorflow-neuronx-2.8.4.2.1.0                    3.8, 3.9, 3.10 
tensorflow-neuronx-2.9.3.2.1.0                    3.8, 3.9, 3.10 
torch-neuronx-1.13.1.1.17.0                       3.8, 3.9, 3.10, 3.11 
torch-neuronx-2.1.2.2.4.0                         3.8, 3.9, 3.10, 3.11 
torch-neuronx-2.5.1.2.4.0                         3.8, 3.9, 3.10, 3.11 
torch_xla-1.13.1+torchneurong                     3.8, 3.9, 3.10, 3.11 
torch_xla-2.1.6                                   3.8, 3.9, 3.10, 3.11 
transformers-neuronx-0.13.380                     3.8, 3.9, 3.10, 3.11

Supported Numpy Versions#

Neuron supports versions >= 1.21.6 and <= 1.22.2

Supported HuggingFace Transformers Versions#

Package

Supported HuggingFace Transformers Versions

torch-neuronx

< 4.35 and >=4.37.2

transformers-neuronx

>= 4.36.0

neuronx-distributed - Llama model class

4.31

neuronx-distributed - GPT NeoX model class

4.26

neuronx-distributed - Bert model class

4.26

nemo-megatron

4.31.0

Previous Releases#

This document is relevant for: Inf1, Inf2, Trn1, Trn2