This document is relevant for: Inf1, Inf2, Trn1, Trn1n

What’s New#

Neuron 2.18.1 (04/10/2024)#

Neuron 2.18.1 release introduces Continuous batching(beta) and Neuron vLLM integration(beta) support in Transformers NeuronX library that improves LLM inference throughput. This release also fixes hang issues related to Triton Inference Server as well as updating Neuron DLAMIs and DLCs with this release(2.18.1). See more in Transformers Neuron (transformers-neuronx) release notes and Neuron Compiler (neuronx-cc) release notes

Neuron 2.18.0 (04/01/2024)#

What’s New#

Neuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).

Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).

Inference highlights: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature. This release also adds a new sample for Mixtral-8x7B-v0.1 and mistralai/Mistral-7B-Instruct-v0.2 in TNx.

Neuron DLAMI and Neuron DLC support highlights: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Training

Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

PyTorch NeuronX (torch-neuronx)

Trn1/Trn1n,Inf2

NeuronX Nemo Megatron for Training

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Trn1/Trn1n,Inf2

Neuron DLAMI and DLC

  • New Neuron Multi Framework Deep Learning AMI (DLAMI) for Ubuntu 22 with separate virtual environments for PyTorch 2.1, PyTorch 1.13, Transformers NeuronX and Tensorflow 2.10. See setup guide and Neuron DLAMI User Guide

  • Neuron Multi Framework Deep Learning AMI (DLAMI) is now the default Neuron AMI in QuickStart AMI list when launching Neuron instances for Ubuntu through AWS console. See setup guide

  • Neuron DLAMIs for PyTorch 1.13 and Tensorflow 2.10 are updated with 2.18 Neuron SDK for both Ubuntu 20 and AL2. See Neuron DLAMI User Guide

  • SSM parameter support for Neuron DLAMIs to find the DLAMI id with latest Neuron release SDK. See Neuron DLAMI User Guide

  • New Neuron Deep Learning Containers(DLCs) for PyTorch 2.1 Inference and Training. See Deploy Containers with Neuron

  • PyTorch 1.13 Inference and Training DLCs are updated with latest 2.18 Neuron SDK and now also comes with pre-installed NeuronX Distributed library. See Deploy Containers with Neuron

  • Neuron DLCs are now hosted both in public Neuron ECR and as private images. Private images are only needed when using with Sagemaker. See Deploy Containers with Neuron

  • New Neuron Github Repository to host dockerfiles for Neuron DLCs. See neuron deep learning containers github repo

Inf1,Inf2,Trn1/Trn1n

Other Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

2.18.0 Known Issues and Limitations#

  • For PyTorch 2.1 (NeuronX), slow convergence for LLaMA-2 70B training when using Zero Redundancy Optimizer (ZeRO1) can be resolved by removing all compiler flags.

  • For PyTorch 2.1 (NeuronX), torch-xla 2.1 is incompatible with the default GLibC on AL2. Users are advised to migrate to Amazon Linux 2023 , Ubuntu 22 or Ubuntu 20 Operating systems.

  • See component release notes below for any additional known issues.

Neuron Components Release Notes#

Inf1, Trn1/Trn1n and Inf2 common packages#

Component

Instance/s

Package/s

Details

Neuron Runtime

Trn1/Trn1n, Inf1, Inf2

  • Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm)

  • Inf1: Runtime is linked into the ML frameworks packages

Neuron Runtime Driver

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-dkms (.deb, .rpm)

Neuron System Tools

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-tools (.deb, .rpm)

Containers

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-k8-plugin (.deb, .rpm)

  • aws-neuronx-k8-scheduler (.deb, .rpm)

  • aws-neuronx-oci-hooks (.deb, .rpm)

NeuronPerf (Inference only)

Trn1/Trn1n, Inf1, Inf2

  • neuronperf (.whl)

TensorFlow Model Server Neuron

Trn1/Trn1n, Inf1, Inf2

  • tensorflow-model-server-neuronx (.deb, .rpm)

Neuron Documentation

Trn1/Trn1n, Inf1, Inf2

Trn1/Trn1n and Inf2 only packages#

Component

Instance/s

Package/s

Details

PyTorch Neuron

Trn1/Trn1n, Inf2

  • torch-neuronx (.whl)

TensorFlow Neuron

Trn1/Trn1n, Inf2

  • tensorflow-neuronx (.whl)

Neuron Compiler (Trn1/Trn1n, Inf2 only)

Trn1/Trn1n, Inf2

  • neuronx-cc (.whl)

Collective Communication library

Trn1/Trn1n, Inf2

  • aws-neuronx-collective (.deb, .rpm)

Neuron Custom C++ Operators

Trn1/Trn1n, Inf2

  • aws-neuronx-gpsimd-customop (.deb, .rpm)

  • aws-neuronx-gpsimd-tools (.deb, .rpm)

Transformers Neuron

Trn1/Trn1n, Inf2

  • transformers-neuronx (.whl)

Neuron Distributed

Trn1/Trn1n, Inf2

  • neuronx-distributed (.whl)

AWS Neuron Reference for NeMo Megatron

Trn1/Trn1n

Note

In next releases aws-neuronx-tools and aws-neuronx-runtime-lib will add support for Inf1.

Release Artifacts#

Trn1 packages#

List of packages in Neuron 2.18.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.20.22.0 
Driver                              aws-neuronx-dkms-2.16.7.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.9.1.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.9.0.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.20.13.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.20.13.0 
OCI                                 aws-neuronx-oci-hook-2.3.0.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.20.22.0 
System Tools                        aws-neuronx-tools-2.17.1.0 
Framework                           libneuronxla-2.0.965 
Framework                           libneuronxla-0.5.971 
Compiler                            neuronx-cc-2.13.68.0 
Neuron Distributed                  neuronx_distributed-0.7.0 
TensorBoard                         tensorboard-plugin-neuronx-2.6.7.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.7.4.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.10.19.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.7.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.14.0 
PyTorch                             torch-neuronx-2.1.2.2.1.0 
PyTorch                             torch_xla-1.13.1+torchneurone 
PyTorch                             torch_xla-2.1.2 
Transformers Neuron                 transformers-neuronx-0.10.0.360

Inf2 packages#

List of packages in Neuron 2.18.1:

Component                           Package                                           
Collective Communication Library    aws-neuronx-collectives-2.20.22.0 
Driver                              aws-neuronx-dkms-2.16.7.0 
nan                                 aws-neuronx-gpsimd-customop-lib-0.9.1.0 
CustomOps Tools                     aws-neuronx-gpsimd-tools-0.9.0.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.20.13.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.20.13.0 
OCI                                 aws-neuronx-oci-hook-2.3.0.0 
General                             aws-neuronx-runtime-discovery-2.9 
Runtime Library                     aws-neuronx-runtime-lib-2.20.22.0 
System Tools                        aws-neuronx-tools-2.17.1.0 
Framework                           libneuronxla-2.0.965 
Framework                           libneuronxla-0.5.971 
Compiler                            neuronx-cc-2.13.68.0 
Neuron Distributed                  neuronx_distributed-0.7.0 
TensorBoard                         tensorboard-plugin-neuronx-2.6.7.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.7.4.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.10.19.0 
TensorFlow                          tensorflow-neuronx-2.10.1.2.1.0 
TensorFlow                          tensorflow-neuronx-2.8.4.2.1.0 
TensorFlow                          tensorflow-neuronx-2.9.3.2.1.0 
PyTorch                             torch-neuronx-1.13.1.1.14.0 
PyTorch                             torch-neuronx-2.1.2.2.1.0 
PyTorch                             torch_xla-1.13.1+torchneurone 
PyTorch                             torch_xla-2.1.2 
Transformers Neuron                 transformers-neuronx-0.10.0.360

Inf1 packages#

List of packages in Neuron 2.18.1:

Component                           Package                                           
Driver                              aws-neuronx-dkms-2.16.7.0 
Kubernetes Plugin                   aws-neuronx-k8-plugin-2.20.13.0 
Kubernetes Scheduler                aws-neuronx-k8-scheduler-2.20.13.0 
OCI                                 aws-neuronx-oci-hook-2.3.0.0 
System Tools                        aws-neuronx-tools-2.17.1.0 
Compiler                            dmlc_nnvm-1.19.0.0 
Compiler                            dmlc_topi-1.19.0.0 
Compiler                            dmlc_tvm-1.19.0.0 
Compiler                            inferentia_hwm-1.17.0.0 
MXNet                               mx_neuron-1.8.0.2.4.50.0 
MXNet                               mxnet_neuron-1.5.1.1.10.0.0 
Compiler                            neuron-cc-1.22.0.0 
Perf Tools                          neuronperf-1.8.55.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.10.1.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.7.4.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.8.4.2.10.19.0 
TensorFlow Model Server             tensorflow-model-server-neuronx-2.9.3.2.10.19.0 
TensorFlow                          tensorflow-neuron-2.10.1.2.10.19.0 
TensorFlow                          tensorflow-neuron-2.7.4.2.10.19.0 
TensorFlow                          tensorflow-neuron-2.8.4.2.10.19.0 
TensorFlow                          tensorflow-neuron-2.9.3.2.10.19.0 
PyTorch                             torch-neuron-1.10.2.2.9.74.0 
PyTorch                             torch-neuron-1.11.0.2.9.74.0 
PyTorch                             torch-neuron-1.12.1.2.9.74.0 
PyTorch                             torch-neuron-1.13.1.2.9.74.0 
PyTorch                             torch-neuron-1.9.1.2.9.74.0

Supported Python Versions for Inf1 packages#

List of packages in Neuron 2.18.1:

Package                                        Supported Python Versions              
dmlc_nnvm-1.19.0.0                                3.8, 3.9, 3.10 
dmlc_topi-1.19.0.0                                3.8, 3.9, 3.10 
dmlc_tvm-1.19.0.0                                 3.8, 3.9, 3.10 
inferentia_hwm-1.17.0.0                           3.8, 3.9, 3.10 
mx_neuron-1.8.0.2.4.50.0                          3.8, 3.9, 3.10 
mxnet_neuron-1.5.1.1.10.0.0                       3.8, 3.9, 3.10 
neuron-cc-1.22.0.0                                3.8, 3.9, 3.10 
neuronperf-1.8.55.0                               3.8, 3.9, 3.10 
tensorflow-neuron-2.10.1.2.10.19.0                3.8, 3.9, 3.10 
tensorflow-neuron-2.7.4.2.10.19.0                 3.8, 3.9, 3.10 
tensorflow-neuron-2.8.4.2.10.19.0                 3.8, 3.9, 3.10 
tensorflow-neuron-2.9.3.2.10.19.0                 3.8, 3.9, 3.10 
torch-neuron-1.10.2.2.9.74.0                      3.8, 3.9, 3.10 
torch-neuron-1.11.0.2.9.74.0                      3.8, 3.9, 3.10 
torch-neuron-1.12.1.2.9.74.0                      3.8, 3.9, 3.10 
torch-neuron-1.13.1.2.9.74.0                      3.8, 3.9, 3.10 
torch-neuron-1.9.1.2.9.74.0                       3.8, 3.9, 3.10

Supported Python Versions for Inf2/Trn1 packages#

List of packages in Neuron 2.18.1:

Package                                        Supported Python Versions              
aws-neuronx-runtime-discovery-2.9                 3.8, 3.9, 3.10 
libneuronxla-2.0.965                              3.8, 3.9, 3.10 
libneuronxla-0.5.971                              3.8, 3.9, 3.10 
neuronx-cc-2.13.68.0                              3.8, 3.9, 3.10 
neuronx_distributed-0.7.0                         3.8, 3.9, 3.10 
tensorflow-neuronx-2.10.1.2.1.0                   3.8, 3.9, 3.10 
tensorflow-neuronx-2.8.4.2.1.0                    3.8, 3.9, 3.10 
tensorflow-neuronx-2.9.3.2.1.0                    3.8, 3.9, 3.10 
torch-neuronx-1.13.1.1.14.0                       3.8, 3.9, 3.10 
torch-neuronx-2.1.2.2.1.0                         3.8, 3.9, 3.10 
torch_xla-1.13.1+torchneurone                     3.8, 3.9, 3.10 
torch_xla-2.1.2                                   3.8, 3.9, 3.10 
transformers-neuronx-0.10.0.360                   3.8, 3.9, 3.10

Supported Numpy Versions#

Neuron supports versions >= 1.21.6 and <= 1.22.2

Supported HuggingFace Transformers Versions#

Package

Supported HuggingFace Transformers Versions

torch-neuronx

< 4.35 and >=4.37.2

transformers-neuronx

>= 4.36.0

neuronx-distributed - Llama model class

4.31

neuronx-distributed - GPT NeoX model class

4.26

neuronx-distributed - Bert model class

4.26

nemo-megatron

4.31.0