What’s New
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
What’s New#
Table of contents
Neuron 2.18.2 (04/25/2024)#
Patch release with minor Neuron Compiler bug fixes and enhancements. See more in Neuron Compiler (neuronx-cc) release notes
Neuron 2.18.1 (04/10/2024)#
Neuron 2.18.1 release introduces Continuous batching(beta) and Neuron vLLM integration(beta) support in Transformers NeuronX library that improves LLM inference throughput. This release also fixes hang issues related to Triton Inference Server as well as updating Neuron DLAMIs and DLCs with this release(2.18.1). See more in Transformers Neuron (transformers-neuronx) release notes and Neuron Compiler (neuronx-cc) release notes
Neuron 2.18.0 (04/01/2024)#
Table of contents
What’s New#
Neuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).
Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).
Inference highlights: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature.
This release also adds a new sample for Mixtral-8x7B-v0.1
and mistralai/Mistral-7B-Instruct-v0.2
in TNx.
Neuron DLAMI and Neuron DLC support highlights: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.
More release content can be found in the table below and each component release notes.
What’s New |
Details |
Instances |
---|---|---|
Transformers NeuronX (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Training |
|
Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Inference |
|
Inf2,Trn1/Trn1n |
PyTorch NeuronX (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
NeuronX Nemo Megatron for Training |
|
Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) |
|
Trn1/Trn1n,Inf2 |
Neuron DLAMI and DLC |
|
Inf1,Inf2,Trn1/Trn1n |
Other Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
2.18.0 Known Issues and Limitations#
For PyTorch 2.1 (NeuronX), slow convergence for LLaMA-2 70B training when using Zero Redundancy Optimizer (ZeRO1) can be resolved by removing all compiler flags.
For PyTorch 2.1 (NeuronX), torch-xla 2.1 is incompatible with the default GLibC on AL2. Users are advised to migrate to Amazon Linux 2023 , Ubuntu 22 or Ubuntu 20 Operating systems.
See component release notes below for any additional known issues.
Neuron Components Release Notes#
Inf1, Trn1/Trn1n and Inf2 common packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
Neuron Runtime |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron Runtime Driver |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron System Tools |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Containers |
Trn1/Trn1n, Inf1, Inf2 |
|
|
NeuronPerf (Inference only) |
Trn1/Trn1n, Inf1, Inf2 |
|
|
TensorFlow Model Server Neuron |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron Documentation |
Trn1/Trn1n, Inf1, Inf2 |
Trn1/Trn1n and Inf2 only packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
PyTorch Neuron |
Trn1/Trn1n, Inf2 |
|
|
TensorFlow Neuron |
Trn1/Trn1n, Inf2 |
|
|
Neuron Compiler (Trn1/Trn1n, Inf2 only) |
Trn1/Trn1n, Inf2 |
|
|
Collective Communication library |
Trn1/Trn1n, Inf2 |
|
|
Neuron Custom C++ Operators |
Trn1/Trn1n, Inf2 |
|
|
Transformers Neuron |
Trn1/Trn1n, Inf2 |
|
|
Neuron Distributed |
Trn1/Trn1n, Inf2 |
|
|
AWS Neuron Reference for NeMo Megatron |
Trn1/Trn1n |
Note
In next releases aws-neuronx-tools
and aws-neuronx-runtime-lib
will add support for Inf1.
Inf1 only packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
PyTorch Neuron |
Inf1 |
|
|
TensorFlow Neuron |
Inf1 |
|
|
Apache MXNet |
Inf1 |
|
|
Neuron Compiler (Inf1 only) |
Inf1 |
|
Release Artifacts#
Table of contents
Trn1 packages#
List of packages in Neuron 2.18.2:
Component Package
Collective Communication Library aws-neuronx-collectives-2.20.22.0
Driver aws-neuronx-dkms-2.16.7.0
nan aws-neuronx-gpsimd-customop-lib-0.9.1.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.9.0.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.20.13.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.20.13.0
OCI aws-neuronx-oci-hook-2.3.0.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.20.22.0
System Tools aws-neuronx-tools-2.17.1.0
Framework libneuronxla-2.0.965
Framework libneuronxla-0.5.971
Compiler neuronx-cc-2.13.72.0
Neuron Distributed neuronx_distributed-0.7.0
TensorBoard tensorboard-plugin-neuronx-2.6.7.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.7.4.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.10.19.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.7.4.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.14.0
PyTorch torch-neuronx-2.1.2.2.1.0
PyTorch torch_xla-1.13.1+torchneurone
PyTorch torch_xla-2.1.2
Transformers Neuron transformers-neuronx-0.10.0.360
Inf2 packages#
List of packages in Neuron 2.18.2:
Component Package
Collective Communication Library aws-neuronx-collectives-2.20.22.0
Driver aws-neuronx-dkms-2.16.7.0
nan aws-neuronx-gpsimd-customop-lib-0.9.1.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.9.0.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.20.13.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.20.13.0
OCI aws-neuronx-oci-hook-2.3.0.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.20.22.0
System Tools aws-neuronx-tools-2.17.1.0
Framework libneuronxla-2.0.965
Framework libneuronxla-0.5.971
Compiler neuronx-cc-2.13.72.0
Neuron Distributed neuronx_distributed-0.7.0
TensorBoard tensorboard-plugin-neuronx-2.6.7.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.7.4.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.10.19.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.14.0
PyTorch torch-neuronx-2.1.2.2.1.0
PyTorch torch_xla-1.13.1+torchneurone
PyTorch torch_xla-2.1.2
Transformers Neuron transformers-neuronx-0.10.0.360
Inf1 packages#
List of packages in Neuron 2.18.2:
Component Package
Driver aws-neuronx-dkms-2.16.7.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.20.13.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.20.13.0
OCI aws-neuronx-oci-hook-2.3.0.0
System Tools aws-neuronx-tools-2.17.1.0
Compiler dmlc_nnvm-1.19.0.0
Compiler dmlc_topi-1.19.0.0
Compiler dmlc_tvm-1.19.0.0
Compiler inferentia_hwm-1.17.0.0
MXNet mx_neuron-1.8.0.2.4.50.0
MXNet mxnet_neuron-1.5.1.1.10.0.0
Compiler neuron-cc-1.22.0.0
Perf Tools neuronperf-1.8.55.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.7.4.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.10.19.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.10.19.0
TensorFlow tensorflow-neuron-2.10.1.2.10.19.0
TensorFlow tensorflow-neuron-2.7.4.2.10.19.0
TensorFlow tensorflow-neuron-2.8.4.2.10.19.0
TensorFlow tensorflow-neuron-2.9.3.2.10.19.0
PyTorch torch-neuron-1.10.2.2.9.74.0
PyTorch torch-neuron-1.11.0.2.9.74.0
PyTorch torch-neuron-1.12.1.2.9.74.0
PyTorch torch-neuron-1.13.1.2.9.74.0
PyTorch torch-neuron-1.9.1.2.9.74.0
Supported Python Versions for Inf1 packages#
List of packages in Neuron 2.18.2:
Package Supported Python Versions
dmlc_nnvm-1.19.0.0 3.8, 3.9, 3.10
dmlc_topi-1.19.0.0 3.8, 3.9, 3.10
dmlc_tvm-1.19.0.0 3.8, 3.9, 3.10
inferentia_hwm-1.17.0.0 3.8, 3.9, 3.10
mx_neuron-1.8.0.2.4.50.0 3.8, 3.9, 3.10
mxnet_neuron-1.5.1.1.10.0.0 3.8, 3.9, 3.10
neuron-cc-1.22.0.0 3.8, 3.9, 3.10
neuronperf-1.8.55.0 3.8, 3.9, 3.10
tensorflow-neuron-2.10.1.2.10.19.0 3.8, 3.9, 3.10
tensorflow-neuron-2.7.4.2.10.19.0 3.8, 3.9, 3.10
tensorflow-neuron-2.8.4.2.10.19.0 3.8, 3.9, 3.10
tensorflow-neuron-2.9.3.2.10.19.0 3.8, 3.9, 3.10
torch-neuron-1.10.2.2.9.74.0 3.8, 3.9, 3.10
torch-neuron-1.11.0.2.9.74.0 3.8, 3.9, 3.10
torch-neuron-1.12.1.2.9.74.0 3.8, 3.9, 3.10
torch-neuron-1.13.1.2.9.74.0 3.8, 3.9, 3.10
torch-neuron-1.9.1.2.9.74.0 3.8, 3.9, 3.10
Supported Python Versions for Inf2/Trn1 packages#
List of packages in Neuron 2.18.2:
Package Supported Python Versions
aws-neuronx-runtime-discovery-2.9 3.8, 3.9, 3.10
libneuronxla-2.0.965 3.8, 3.9, 3.10
libneuronxla-0.5.971 3.8, 3.9, 3.10
neuronx-cc-2.13.72.0 3.8, 3.9, 3.10
neuronx_distributed-0.7.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.10.1.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.8.4.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.9.3.2.1.0 3.8, 3.9, 3.10
torch-neuronx-1.13.1.1.14.0 3.8, 3.9, 3.10
torch-neuronx-2.1.2.2.1.0 3.8, 3.9, 3.10
torch_xla-1.13.1+torchneurone 3.8, 3.9, 3.10
torch_xla-2.1.2 3.8, 3.9, 3.10
transformers-neuronx-0.10.0.360 3.8, 3.9, 3.10
Supported Numpy Versions#
Neuron supports versions >= 1.21.6 and <= 1.22.2
Supported HuggingFace Transformers Versions#
Package |
Supported HuggingFace Transformers Versions |
---|---|
torch-neuronx |
< 4.35 and >=4.37.2 |
transformers-neuronx |
>= 4.36.0 |
neuronx-distributed - Llama model class |
4.31 |
neuronx-distributed - GPT NeoX model class |
4.26 |
neuronx-distributed - Bert model class |
4.26 |
nemo-megatron |
4.31.0 |