This document is relevant for: Inf1, Inf2, Trn1, Trn1n

What’s New#

Neuron 2.19.0 (07/03/2024)#

What’s New#

Neuron 2.19 release adds Llama 3 training support and introduces Flash Attention kernel support to enable LLM training and inference for large sequence lengths. Neuron 2.19 also introduces new features and performance improvements to LLM training, improves LLM inference performance for Llama 3 model by upto 20%, and adds tools for monitoring, problem detection and recovery in Kubernetes (EKS) environments, improving efficiency and reliability.

Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by support for Flash Attention to enable training with longer sequence lengths >= 8K. Neuron 2.19 adds support for Llama 3 model training. This release also adds support for Interleaved pipeline parallelism to reduce idle time (bubble size) and enhance training efficiency and resource utilization for large cluster sizes.

Inference highlights: Flash Attention kernel support in the Transformers NeuronX library enables LLM inference for context lengths of up to 32k. This release also adds [Beta] support for continuous batching with mistralai/Mistral-7B-v0.2 in Transformers NeuronX.

Tools and Neuron DLAMI/DLC highlights: This release introduces the new Neuron Node Problem Detector and Recovery plugin in EKS supported Kubernetes environments:a tool to monitor the health of Neuron instances and triggers automatic node replacement upon detecting an unrecoverable error. Neuron 2.19 introduces the new Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes, and adds monitoring support with Prometheus and Grafana. This release also introduces new PyTorch 2.1 and PyTorch 1.13 single framework DLAMIs for Ubuntu 22. Neuron DLAMIs and Neuron DLCs are also updated to support this release (Neuron 2.19).

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Training

Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

PyTorch NeuronX (torch-neuronx)

  • Support for FP32 master weights and BF16 all-gather during Zero1 training to enhance training efficiency.

  • Support to add custom SILU activation functions by configuring NEURON_CUSTOM_SILU variable

  • See more at PyTorch Neuron (torch-neuronx) release notes

Trn1/Trn1n,Inf2

NeuronX Nemo Megatron for Training

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Trn1/Trn1n,Inf2

Neuron DLAMI and DLC

Inf1,Inf2,Trn1/Trn1n

Neuron Tools

  • Support for new Neuron Node Problem Detector and Recovery plugin in EKS supported kubernetes environments that monitors health of Neuron instances and triggers automatic node replacement upon detecting an unrecoverable error. See configuration and tutorial.

  • Support for new Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See tutorial

  • Support for Neuron scheduler extension to enforce allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See tutorial

  • Neuron Profiler bugfixes and UI updates, including improvements to visualizing collective operations and to the consistency of information being displayed

  • Added memory usage metrics and device count information to neuron-monitor

  • See more at Neuron System Tools

Inf1,Inf2,Trn1/Trn1n

Neuron Runtime

  • Support for dynamic Direct Memory Access (DMA) that reduces memory usage during runtime.

  • Runtime Enhancements that improve collectives performance

  • See more at Neuron Runtime Release Notes

Inf1,Inf2,Trn1/Trn1n

Other Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

2.19.0 Known Issues and Limitations#

  • Known issues when using on_device_generation flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at Transformers Neuron (transformers-neuronx) release notes

  • See component release notes below for any additional known issues.

Neuron Components Release Notes#

Inf1, Trn1/Trn1n and Inf2 common packages#

Component

Instance/s

Package/s

Details

Neuron Runtime

Trn1/Trn1n, Inf1, Inf2

  • Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm)

  • Inf1: Runtime is linked into the ML frameworks packages

Neuron Runtime Driver

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-dkms (.deb, .rpm)

Neuron System Tools

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-tools (.deb, .rpm)

Neuron DLAMI

Trn1/Trn1n, Inf1, Inf2

Neuron DLC

Trn1/Trn1n, Inf1, Inf2

Containers

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-k8-plugin (.deb, .rpm)

  • aws-neuronx-k8-scheduler (.deb, .rpm)

  • aws-neuronx-oci-hooks (.deb, .rpm)

NeuronPerf (Inference only)

Trn1/Trn1n, Inf1, Inf2

  • neuronperf (.whl)

TensorFlow Model Server Neuron

Trn1/Trn1n, Inf1, Inf2

  • tensorflow-model-server-neuronx (.deb, .rpm)

Neuron Documentation

Trn1/Trn1n, Inf1, Inf2

Trn1/Trn1n and Inf2 only packages#

Component

Instance/s

Package/s

Details

PyTorch Neuron

Trn1/Trn1n, Inf2

  • torch-neuronx (.whl)

TensorFlow Neuron

Trn1/Trn1n, Inf2

  • tensorflow-neuronx (.whl)

Neuron Compiler (Trn1/Trn1n, Inf2 only)

Trn1/Trn1n, Inf2

  • neuronx-cc (.whl)

Collective Communication library

Trn1/Trn1n, Inf2

  • aws-neuronx-collective (.deb, .rpm)

Neuron Custom C++ Operators

Trn1/Trn1n, Inf2

  • aws-neuronx-gpsimd-customop (.deb, .rpm)

  • aws-neuronx-gpsimd-tools (.deb, .rpm)

Transformers Neuron

Trn1/Trn1n, Inf2

  • transformers-neuronx (.whl)

Neuron Distributed

Trn1/Trn1n, Inf2

  • neuronx-distributed (.whl)

AWS Neuron Reference for NeMo Megatron

Trn1/Trn1n

Note

In next releases aws-neuronx-tools and aws-neuronx-runtime-lib will add support for Inf1.

Release Artifacts#

Trn1 packages#

List of packages in Neuron 2.19.0:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.21.46.0 
Driver                              aws-neuronx-dkms-2.17.17.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.11.4.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.11.3.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.21.14.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.21.14.0 
OCI                                 aws-neuronx-oci-hook-2.4.4.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.21.41.0 
System Tools                        aws-neuronx-tools-2.18.3.0 
Framework                           libneuronxla-2.0.2335 
Framework                           libneuronxla-0.5.1795 
Compiler                            neuronx-cc-2.14.213.0 
Neuron Distributed                  neuronx_distributed-0.8.0 
TensorBoard                         tensorboard-plugin-neuronx-2.6.63.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.11.4.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.11.4.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.11.4.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.15.0 
PyTorch                             torch-neuronx-2.1.2.2.2.0 
PyTorch                             torch_xla-1.13.1+torchneuronf 
PyTorch                             torch_xla-2.1.3 
Transformers Neuron                 transformers-neuronx-0.11.351

Inf2 packages#

List of packages in Neuron 2.19.0:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.21.46.0 
Driver                              aws-neuronx-dkms-2.17.17.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.11.4.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.11.3.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.21.14.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.21.14.0 
OCI                                 aws-neuronx-oci-hook-2.4.4.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.21.41.0 
System Tools                        aws-neuronx-tools-2.18.3.0 
Framework                           libneuronxla-2.0.2335 
Framework                           libneuronxla-0.5.1795 
Compiler                            neuronx-cc-2.14.213.0 
Neuron Distributed                  neuronx_distributed-0.8.0 
TensorBoard                         tensorboard-plugin-neuronx-2.6.63.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.11.4.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.11.4.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.11.4.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.15.0 
PyTorch                             torch-neuronx-2.1.2.2.2.0 
PyTorch                             torch_xla-1.13.1+torchneuronf 
PyTorch                             torch_xla-2.1.3 
Transformers Neuron                 transformers-neuronx-0.11.351

Inf1 packages#

List of packages in Neuron 2.19.0:

Component                           Package                                           
Driver                              aws-neuronx-dkms-2.17.17.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.21.14.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.21.14.0 
OCI                                 aws-neuronx-oci-hook-2.4.4.0 
System Tools                        aws-neuronx-tools-2.18.3.0 
Compiler                            dmlc_nnvm-1.19.1.0 
Compiler                            dmlc_topi-1.19.1.0 
Compiler                            dmlc_tvm-1.19.1.0 
Compiler                            inferentia_hwm-1.17.1.0 
MXNet                               mx_neuron-1.8.0.2.4.147.0 
MXNet                               mxnet_neuron-1.5.1.1.10.0.0 
Compiler                            neuron-cc-1.23.5.0 
Perf Tools                          neuronperf-1.8.93.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.11.4.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.11.4.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.11.4.0 
TensorFlow                          tensorflow-neuron-2.10.1.2.11.4.0 
TensorFlow                          tensorflow-neuron-2.7.4.2.11.4.0 
TensorFlow                          tensorflow-neuron-2.8.4.2.11.4.0 
TensorFlow                          tensorflow-neuron-2.9.3.2.11.4.0 
PyTorch                             torch-neuron-1.10.2.2.10.12.0 
PyTorch                             torch-neuron-1.11.0.2.10.12.0 
PyTorch                             torch-neuron-1.12.1.2.10.12.0 
PyTorch                             torch-neuron-1.13.1.2.10.12.0 
PyTorch                             torch-neuron-1.9.1.2.10.12.0

Supported Python Versions for Inf1 packages#

List of packages in Neuron 2.19.0:

Package                                        Supported Python Versions              
dmlc_nnvm-1.19.1.0                                3.8, 3.9, 3.10 
dmlc_topi-1.19.1.0                                3.8, 3.9, 3.10 
dmlc_tvm-1.19.1.0                                 3.8, 3.9, 3.10 
inferentia_hwm-1.17.1.0                           3.8, 3.9, 3.10 
mx_neuron-1.8.0.2.4.147.0                         3.8, 3.9, 3.10 
mxnet_neuron-1.5.1.1.10.0.0                       3.8, 3.9, 3.10 
neuron-cc-1.23.5.0                                3.8, 3.9, 3.10 
neuronperf-1.8.93.0                               3.8, 3.9, 3.10 
tensorflow-neuron-2.10.1.2.11.4.0                 3.8, 3.9, 3.10 
tensorflow-neuron-2.7.4.2.11.4.0                  3.8, 3.9, 3.10 
tensorflow-neuron-2.8.4.2.11.4.0                  3.8, 3.9, 3.10 
tensorflow-neuron-2.9.3.2.11.4.0                  3.8, 3.9, 3.10 
torch-neuron-1.10.2.2.10.12.0                     3.8, 3.9, 3.10 
torch-neuron-1.11.0.2.10.12.0                     3.8, 3.9, 3.10 
torch-neuron-1.12.1.2.10.12.0                     3.8, 3.9, 3.10 
torch-neuron-1.13.1.2.10.12.0                     3.8, 3.9, 3.10 
torch-neuron-1.9.1.2.10.12.0                      3.8, 3.9, 3.10

Supported Python Versions for Inf2/Trn1 packages#

List of packages in Neuron 2.19.0:

Package                                        Supported Python Versions              
aws-neuronx-runtime-discovery-2.9                 3.8, 3.9, 3.10 
libneuronxla-2.0.2335                             3.8, 3.9, 3.10 
libneuronxla-0.5.1795                             3.8, 3.9, 3.10 
neuronx-cc-2.14.213.0                             3.8, 3.9, 3.10 
neuronx_distributed-0.8.0                         3.8, 3.9, 3.10 
tensorflow-neuronx-2.10.1.2.1.0                   3.8, 3.9, 3.10 
tensorflow-neuronx-2.8.4.2.1.0                    3.8, 3.9, 3.10 
tensorflow-neuronx-2.9.3.2.1.0                    3.8, 3.9, 3.10 
torch-neuronx-1.13.1.1.15.0                       3.8, 3.9, 3.10 
torch-neuronx-2.1.2.2.2.0                         3.8, 3.9, 3.10 
torch_xla-1.13.1+torchneuronf                     3.8, 3.9, 3.10 
torch_xla-2.1.3                                   3.8, 3.9, 3.10 
transformers-neuronx-0.11.351                     3.8, 3.9, 3.10

Supported Numpy Versions#

Neuron supports versions >= 1.21.6 and <= 1.22.2

Supported HuggingFace Transformers Versions#

Package

Supported HuggingFace Transformers Versions

torch-neuronx

< 4.35 and >=4.37.2

transformers-neuronx

>= 4.36.0

neuronx-distributed - Llama model class

4.31

neuronx-distributed - GPT NeoX model class

4.26

neuronx-distributed - Bert model class

4.26

nemo-megatron

4.31.0