This document is relevant for: Inf1, Inf2, Trn1, Trn1n

What’s New#

Neuron 2.20.1 (10/25/2024)#

Neuron 2.20.1 release addresses an issue with the Neuron Persistent Cache that was brought forth in 2.20 release. In the 2.20 release, the Neuron persistent cache issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.

This release also addresses the excessive lock wait time issue during neuron_parallel_compile graph extraction for large cluster training. See PyTorch Neuron (torch-neuronx) release notes and Neuron XLA pluggable device (libneuronxla) release notes.

Additionally, Neuron 2.20.1 introduces new Multi Framework DLAMI for Amazon Linux 2023 (AL2023) that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports. See Neuron DLAMI Release Notes.

Neuron 2.20.1 Training DLC is also updated to pre-install the necessary dependencies and support NxD Training library out of the box. See Neuron DLC Release Notes

Neuron 2.20.0 (09/16/2024)#

What’s New#

Overview: Neuron 2.20 release introduces usability improvements and new capabilities across training and inference workloads. A key highlight is the introduction of Neuron Kernel Interface (beta). NKI, pronounced ‘Nicky’, is enabling developers to build optimized custom compute kernels for Trainium and Inferentia. Additionally, this release introduces NxD Training (beta), a PyTorch-based library enabling efficient distributed training, with a user-friendly interface compatible with NeMo. This release also introduces the support for the JAX framework (beta).

Neuron 2.20 also adds inference support for Pixart-alpha and Pixart-sigma Diffusion-Transformers (DiT) models, and adds support for Llama 3.1 8B, 70B and 405B models inference supporting up to 128K context length.

Neuron Kernel Interface: NKI is a programming interface enabling developers to build optimized compute custom kernels on top of Trainium and Inferentia. NKI empowers developers to enhance deep learning models with new capabilities, performance optimizations, and scientific innovation. It natively integrates with PyTorch and JAX, providing a Python-based programming environment with Triton-like syntax and tile-level semantics, offering a familiar programming experience for developers. All of our NKI work is shared as open source, enabling the community developers to collaborate and use these kernels in their projects, improve existing kernels, and contribute new NKI kernels. The list of kernels we are introducing includes Optimized Flash Attention NKI kernel (flash_attention), a NKI kernel with an optimized implementation of Mamba model architecture (mamba_nki_kernels) and Optimized Stable Diffusion Attention kernel (fused_sd_attention_small_head). In addition to NKI kernel samples for average_pool2d, rmsnorm, tensor_addition, layernorm, transpose_2d, and matrix_multiplication.

For more information see NKI section and check the NKI samples Github repository: aws-neuron/nki-samples

NxD Training (NxDT): NxDT is a PyTorch-based library that adds support for user-friendly distributed training experience through a YAML configuration file compatible with NeMo,, allowing users to easily set up their training workflows. At the same time, NxDT maintains flexibility, enabling users to choose between using the YAML configuration file, PyTorch Lightning Trainer, or writing their own custom training script using the NxD Core. The library supports PyTorch model classes including Hugging Face and Megatron-LM. Additionally, it leverages NeMo’s data engineering and data science modules enabling end-to-end training workflows on NxDT, and providing compatability with NeMo through minimal changes to the YAML configuration file for models that are already supported in NxDT. Furthermore, the functionality of the Neuron NeMo Megatron (NNM) library is now part of NxDT, ensuring a smooth migration path from NNM to NxDT.

For more information see NxD Training (beta) and check the NxD Training Github repository: aws-neuron/neuronx-distributed-training

Training Highlights: This release adds support for Llama 3.1 8B and 70B model training up to 32K sequence length (beta). It also adds support for torch.autocast() for native PyTorch mixed precision support and PEFT LoRA model training.

Inference Highlights: Neuron 2.20 adds support for Llama 3.1 models (405b, 70b, and 8b variants) and introduces new features like on-device top-p sampling for improved performance, support for up to 128K context length through Flash Decoding, and multi-node inference for large models like Llama-3.1-405B. Furthermore, this release improves model loading in Transformers Neuronx for models like Llama-3 by loading the pre-sharded or pre-transformed weights and adds support to Diffusion-Transformers (DiT) models such as Pixart-alpha and Pixart-sigma.

Compiler: This release introduces Neuron Compiler support for RMSNorm and RMSNormDx operators, along with enhanced performance for the sort operator.

System Tools: As for the Neuron Tools, it enables NKI profiling support in the Neuron Profiler and introduces improvements to the Neuron Profiler UI.

Neuron Driver: This release adds support for the Rocky Linux 9.0 operating system.

Neuron Containers: This release introduces Neuron Helm Chart, which helps streamline the deployment of AWS Neuron components on Amazon EKS. See Neuron Helm Chart Github repository: aws-neuron/neuron-helm-charts. Additionaly, this release adds ECS support for the “Neuron Node Problem Detector and Recovery” artifact. See Neuron Problem Detector And Recovery.

Neuron DLAMIs and DLCs: This release includes the addition of the NxDT package to various Neuron DLAMIs (Multi-Framework Neuron DLAMI, PyTorch 1.13 Neuron DLAMI, and PyTorch 2.1 Neuron DLAMI) and the inclusion of NxDT in the PyTorch 1.13 Training Neuron DLC and PyTorch 2.1 Training Neuron DLC.

Software Maintenance Policy: This release also updates Neuron SDK software maintenance poclicy, For more information see Neuron Software Maintenance policy

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

NxD Core (neuronx-distributed)

Training:

  • Support for LoRA finetuning

  • Support for Mixed precision enhancements

Inference:

Trn1/Trn1n

NxD Training (neuronx-distributed-training)

Trn1/Trn1n

PyTorch NeuronX (torch-neuronx)

Trn1/Trn1n,Inf2

NeuronX Nemo Megatron for Training

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Trn1/Trn1n,Inf2

Neuron Kernel Interface (NKI)

Trn1/Trn1n,Inf2

Neuron Deep Learning AMIs (DLAMIs)

Inf1,Inf2,Trn1/Trn1n

Neuron Deep Learning Containers (DLCs)

Inf1,Inf2,Trn1/Trn1n

Neuron Tools

Inf1,Inf2,Trn1/Trn1n

Neuron Runtime

Inf1,Inf2,Trn1/Trn1n

Release Annoucements

Inf1, Inf2, Trn1/Trn1n

Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

2.20.0 Known Issues and Limitations#

  • Known issues when using on_device_generation flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at Transformers Neuron (transformers-neuronx) release notes

  • See component release notes below for any additional known issues.

Neuron Components Release Notes#

Inf1, Trn1/Trn1n and Inf2 common packages#

Component

Instance/s

Package/s

Details

Neuron Runtime

Trn1/Trn1n, Inf1, Inf2

  • Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm)

  • Inf1: Runtime is linked into the ML frameworks packages

Neuron Runtime Driver

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-dkms (.deb, .rpm)

Neuron System Tools

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-tools (.deb, .rpm)

Containers

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-k8-plugin (.deb, .rpm)

  • aws-neuronx-k8-scheduler (.deb, .rpm)

  • aws-neuronx-oci-hooks (.deb, .rpm)

NeuronPerf (Inference only)

Trn1/Trn1n, Inf1, Inf2

  • neuronperf (.whl)

TensorFlow Model Server Neuron

Trn1/Trn1n, Inf1, Inf2

  • tensorflow-model-server-neuronx (.deb, .rpm)

Trn1/Trn1n and Inf2 only packages#

Component

Instance/s

Package/s

Details

PyTorch Neuron

Trn1/Trn1n, Inf2

  • torch-neuronx (.whl)

TensorFlow Neuron

Trn1/Trn1n, Inf2

  • tensorflow-neuronx (.whl)

Neuron Compiler (Trn1/Trn1n, Inf2 only)

Trn1/Trn1n, Inf2

  • neuronx-cc (.whl)

Neuron Kernel Interface (NKI) Compiler (Trn1/Trn1n, Inf2 only)

Trn1/Trn1n, Inf2

  • Supported within neuronx-cc (.whl)

Collective Communication library

Trn1/Trn1n, Inf2

  • aws-neuronx-collective (.deb, .rpm)

Neuron Custom C++ Operators

Trn1/Trn1n, Inf2

  • aws-neuronx-gpsimd-customop (.deb, .rpm)

  • aws-neuronx-gpsimd-tools (.deb, .rpm)

Transformers Neuron

Trn1/Trn1n, Inf2

  • transformers-neuronx (.whl)

NxD Training

Trn1/Trn1n, Inf2

  • neuronx-distributed-training (.whl)

NxD Core

Trn1/Trn1n, Inf2

  • neuronx-distributed (.whl)

AWS Neuron Reference for NeMo Megatron

Trn1/Trn1n

Inf1 only packages#

Component

Instance/s

Package/s

Details

PyTorch Neuron

Inf1

  • torch-neuron (.whl)

TensorFlow Neuron

Inf1

  • tensorflow-neuron (.whl)

Apache MXNet

Inf1

  • mx_neuron (.whl)

Neuron Compiler (Inf1 only)

Inf1

  • neuron-cc (.whl)

Release Artifacts#

Trn1 packages#

List of packages in Neuron 2.20.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.22.26.0 
Driver                              aws-neuronx-dkms-2.18.12.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.12.2.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.12.1.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.22.4.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.22.4.0 
OCI                                 aws-neuronx-oci-hook-2.5.3.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.22.14.0 
System Tools                        aws-neuronx-tools-2.19.0.0 
Jax                                 jax_neuronx-0.1.1 
Framework                           libneuronxla-2.0.4986.0 
Framework                           libneuronxla-0.5.2978 
Compiler                            neuronx-cc-2.15.141.0 
Compiler                            neuronx-cc-stubs-2.15.141.0 
Neuron Distributed                  neuronx_distributed-0.9.0 
Neuron Distributed Training         neuronx_distributed_training-1.0.0 
TensorBoard                         tensorboard-plugin-neuronx-2.6.63.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.16.0 
PyTorch                             torch-neuronx-2.1.2.2.3.1 
PyTorch                             torch_xla-1.13.1+torchneurong 
PyTorch                             torch_xla-2.1.4 
Transformers Neuron                 transformers-neuronx-0.12.313

Inf2 packages#

List of packages in Neuron 2.20.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.22.26.0 
Driver                              aws-neuronx-dkms-2.18.12.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.12.2.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.12.1.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.22.4.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.22.4.0 
OCI                                 aws-neuronx-oci-hook-2.5.3.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.22.14.0 
System Tools                        aws-neuronx-tools-2.19.0.0 
Jax                                 jax_neuronx-0.1.1 
Framework                           libneuronxla-2.0.4986.0 
Framework                           libneuronxla-0.5.2978 
Compiler                            neuronx-cc-2.15.141.0 
Compiler                            neuronx-cc-stubs-2.15.141.0 
Neuron Distributed                  neuronx_distributed-0.9.0 
TensorBoard                         tensorboard-plugin-neuronx-2.6.63.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.16.0 
PyTorch                             torch-neuronx-2.1.2.2.3.1 
PyTorch                             torch_xla-1.13.1+torchneurong 
PyTorch                             torch_xla-2.1.4 
Transformers Neuron                 transformers-neuronx-0.12.313

Inf1 packages#

List of packages in Neuron 2.20.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.12.35.0 
Driver                              aws-neuronx-dkms-2.18.12.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.22.4.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.22.4.0 
OCI                                 aws-neuronx-oci-hook-2.5.3.0 
Runtime Library                     aws-neuronx-runtime-lib-2.12.23.0 
System Tools                        aws-neuronx-tools-2.19.0.0 
Compiler                            dmlc_nnvm-1.19.6.0 
Compiler                            dmlc_topi-1.19.6.0 
Compiler                            dmlc_tvm-1.19.6.0 
Compiler                            inferentia_hwm-1.17.6.0 
MXNet                               mx_neuron-1.8.0.2.4.147.0 
MXNet                               mxnet_neuron-1.5.1.1.10.0.0 
Compiler                            neuron-cc-1.24.0.0 
Perf Tools                          neuronperf-1.8.93.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.12.2.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuron-2.10.1.2.12.2.0 
TensorFlow                          tensorflow-neuron-2.8.4.2.12.2.0 
TensorFlow                          tensorflow-neuron-2.9.3.2.12.2.0 
PyTorch                             torch-neuron-1.10.2.2.11.7.0 
PyTorch                             torch-neuron-1.11.0.2.11.7.0 
PyTorch                             torch-neuron-1.12.1.2.11.7.0 
PyTorch                             torch-neuron-1.13.1.2.11.7.0 
PyTorch                             torch-neuron-1.9.1.2.11.7.0

Supported Python Versions for Inf1 packages#

List of packages in Neuron 2.20.1:

Package                                        Supported Python Versions              
dmlc_nnvm-1.19.6.0                                3.8, 3.9, 3.10 
dmlc_topi-1.19.6.0                                3.8, 3.9, 3.10 
dmlc_tvm-1.19.6.0                                 3.8, 3.9, 3.10 
inferentia_hwm-1.17.6.0                           3.8, 3.9, 3.10 
mx_neuron-1.8.0.2.4.147.0                         3.8, 3.9, 3.10 
mxnet_neuron-1.5.1.1.10.0.0                       3.8, 3.9, 3.10 
neuron-cc-1.24.0.0                                3.8, 3.9, 3.10 
neuronperf-1.8.93.0                               3.8, 3.9, 3.10 
tensorflow-neuron-2.10.1.2.12.2.0                 3.8, 3.9, 3.10 
tensorflow-neuron-2.8.4.2.12.2.0                  3.8, 3.9, 3.10 
tensorflow-neuron-2.9.3.2.12.2.0                  3.8, 3.9, 3.10 
torch-neuron-1.10.2.2.11.7.0                      3.8, 3.9, 3.10 
torch-neuron-1.11.0.2.11.7.0                      3.8, 3.9, 3.10 
torch-neuron-1.12.1.2.11.7.0                      3.8, 3.9, 3.10 
torch-neuron-1.13.1.2.11.7.0                      3.8, 3.9, 3.10 
torch-neuron-1.9.1.2.11.7.0                       3.8, 3.9, 3.10

Supported Python Versions for Inf2/Trn1 packages#

List of packages in Neuron 2.20.1:

Package                                        Supported Python Versions              
aws-neuronx-runtime-discovery-2.9                 3.8, 3.9, 3.10, 3.11 
jax_neuronx-0.1.1                                 3.9 
libneuronxla-2.0.4986.0                           3.8, 3.9, 3.10, 3.11 
libneuronxla-0.5.2978                             3.8, 3.9, 3.10 
neuronx-cc-2.15.141.0                             3.8, 3.9, 3.10, 3.11 
neuronx-cc-stubs-2.15.141.0                       3.8, 3.9, 3.10 
neuronx_distributed-0.9.0                         3.8, 3.9, 3.10, 3.11 
tensorflow-neuronx-2.10.1.2.1.0                   3.8, 3.9, 3.10 
tensorflow-neuronx-2.8.4.2.1.0                    3.8, 3.9, 3.10 
tensorflow-neuronx-2.9.3.2.1.0                    3.8, 3.9, 3.10 
torch-neuronx-1.13.1.1.16.0                       3.8, 3.9, 3.10, 3.11 
torch-neuronx-2.1.2.2.3.1                         3.8, 3.9, 3.10, 3.11 
torch_xla-1.13.1+torchneurong                     3.8, 3.9, 3.10, 3.11 
torch_xla-2.1.4                                   3.8, 3.9, 3.10, 3.11 
transformers-neuronx-0.12.313                     3.8, 3.9, 3.10, 3.11

Supported Numpy Versions#

Neuron supports versions >= 1.21.6 and <= 1.22.2

Supported HuggingFace Transformers Versions#

Package

Supported HuggingFace Transformers Versions

torch-neuronx

< 4.35 and >=4.37.2

transformers-neuronx

>= 4.36.0

neuronx-distributed - Llama model class

4.31

neuronx-distributed - GPT NeoX model class

4.26

neuronx-distributed - Bert model class

4.26

nemo-megatron

4.31.0

Supported Probuf Versions#

Package

Supported Probuf versions

neuronx-cc

> 3

torch-neuronx

>= 3.20

torch-neuron

< 3.20

transformers-neuronx

>= 3.20

neuronx-distributed

>= 3.20

tensorflow-neuronx

< 3.20

tensorflow-neuron

< 3.20

Supported Linux Kernel Versions#

Neuron Driver (aws-neuronx-dkms) supports Linux kernel versions >= 5.10

Previous Releases#

This document is relevant for: Inf1, Inf2, Trn1, Trn1n