What’s New
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
What’s New#
Table of contents
Neuron 2.19.1 (07/19/2024)#
This release (Neuron 2.19.1) addresses an issue with the Neuron Persistent Cache that was introduced in the previous release, Neuron 2.19. The issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.
Neuron 2.19.0 (07/03/2024)#
Table of contents
What’s New#
Neuron 2.19 release adds Llama 3 training support and introduces Flash Attention kernel support to enable LLM training and inference for large sequence lengths. Neuron 2.19 also introduces new features and performance improvements to LLM training, improves LLM inference performance for Llama 3 model by upto 20%, and adds tools for monitoring, problem detection and recovery in Kubernetes (EKS) environments, improving efficiency and reliability.
Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by support for Flash Attention to enable training with longer sequence lengths >= 8K. Neuron 2.19 adds support for Llama 3 model training. This release also adds support for Interleaved pipeline parallelism to reduce idle time (bubble size) and enhance training efficiency and resource utilization for large cluster sizes.
Inference highlights: Flash Attention kernel support in the Transformers NeuronX library enables LLM inference for context lengths of up to 32k. This release also adds [Beta] support for continuous batching with mistralai/Mistral-7B-v0.2
in Transformers NeuronX.
Tools and Neuron DLAMI/DLC highlights: This release introduces the new Neuron Node Problem Detector and Recovery plugin in EKS supported Kubernetes environments:a tool to monitor the health of Neuron instances and triggers automatic node replacement upon detecting an unrecoverable error. Neuron 2.19 introduces the new Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes, and adds monitoring support with Prometheus and Grafana. This release also introduces new PyTorch 2.1 and PyTorch 1.13 single framework DLAMIs for Ubuntu 22. Neuron DLAMIs and Neuron DLCs are also updated to support this release (Neuron 2.19).
More release content can be found in the table below and each component release notes.
What’s New |
Details |
Instances |
---|---|---|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Transformers NeuronX (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Training |
|
Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Inference |
|
Inf2,Trn1/Trn1n |
PyTorch NeuronX (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
NeuronX Nemo Megatron for Training |
|
Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) |
|
Trn1/Trn1n,Inf2 |
Neuron DLAMI and DLC |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Tools |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Runtime |
|
Inf1,Inf2,Trn1/Trn1n |
Other Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
2.19.0 Known Issues and Limitations#
Known issues when using
on_device_generation
flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at Transformers Neuron (transformers-neuronx) release notesSee component release notes below for any additional known issues.
Neuron Components Release Notes#
Inf1, Trn1/Trn1n and Inf2 common packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
Neuron Runtime |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron Runtime Driver |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron System Tools |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron DLAMI |
Trn1/Trn1n, Inf1, Inf2 |
||
Neuron DLC |
Trn1/Trn1n, Inf1, Inf2 |
||
Containers |
Trn1/Trn1n, Inf1, Inf2 |
|
|
NeuronPerf (Inference only) |
Trn1/Trn1n, Inf1, Inf2 |
|
|
TensorFlow Model Server Neuron |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron Documentation |
Trn1/Trn1n, Inf1, Inf2 |
Trn1/Trn1n and Inf2 only packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
PyTorch Neuron |
Trn1/Trn1n, Inf2 |
|
|
TensorFlow Neuron |
Trn1/Trn1n, Inf2 |
|
|
Neuron Compiler (Trn1/Trn1n, Inf2 only) |
Trn1/Trn1n, Inf2 |
|
|
Collective Communication library |
Trn1/Trn1n, Inf2 |
|
|
Neuron Custom C++ Operators |
Trn1/Trn1n, Inf2 |
|
|
Transformers Neuron |
Trn1/Trn1n, Inf2 |
|
|
Neuron Distributed |
Trn1/Trn1n, Inf2 |
|
|
AWS Neuron Reference for NeMo Megatron |
Trn1/Trn1n |
Note
In next releases aws-neuronx-tools
and aws-neuronx-runtime-lib
will add support for Inf1.
Inf1 only packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
PyTorch Neuron |
Inf1 |
|
|
TensorFlow Neuron |
Inf1 |
|
|
Apache MXNet |
Inf1 |
|
|
Neuron Compiler (Inf1 only) |
Inf1 |
|
Release Artifacts#
Table of contents
Trn1 packages#
List of packages in Neuron 2.19.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.21.46.0
Driver aws-neuronx-dkms-2.17.17.0
nan aws-neuronx-gpsimd-customop-lib-0.11.4.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.11.3.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.21.14.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.21.14.0
OCI aws-neuronx-oci-hook-2.4.4.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.21.41.0
System Tools aws-neuronx-tools-2.18.3.0
Framework libneuronxla-2.0.2335
Framework libneuronxla-0.5.1795
Compiler neuronx-cc-2.14.227.0
Neuron Distributed neuronx_distributed-0.8.0
TensorBoard tensorboard-plugin-neuronx-2.6.63.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.11.4.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.11.4.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.11.4.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.15.0
PyTorch torch-neuronx-2.1.2.2.2.0
PyTorch torch_xla-1.13.1+torchneuronf
PyTorch torch_xla-2.1.3
Transformers Neuron transformers-neuronx-0.11.351
Inf2 packages#
List of packages in Neuron 2.19.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.21.46.0
Driver aws-neuronx-dkms-2.17.17.0
nan aws-neuronx-gpsimd-customop-lib-0.11.4.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.11.3.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.21.14.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.21.14.0
OCI aws-neuronx-oci-hook-2.4.4.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.21.41.0
System Tools aws-neuronx-tools-2.18.3.0
Framework libneuronxla-2.0.2335
Framework libneuronxla-0.5.1795
Compiler neuronx-cc-2.14.227.0
Neuron Distributed neuronx_distributed-0.8.0
TensorBoard tensorboard-plugin-neuronx-2.6.63.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.11.4.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.11.4.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.11.4.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.15.0
PyTorch torch-neuronx-2.1.2.2.2.0
PyTorch torch_xla-1.13.1+torchneuronf
PyTorch torch_xla-2.1.3
Transformers Neuron transformers-neuronx-0.11.351
Inf1 packages#
List of packages in Neuron 2.19.0:
Component Package
Driver aws-neuronx-dkms-2.17.17.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.21.14.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.21.14.0
OCI aws-neuronx-oci-hook-2.4.4.0
System Tools aws-neuronx-tools-2.18.3.0
Compiler dmlc_nnvm-1.19.1.0
Compiler dmlc_topi-1.19.1.0
Compiler dmlc_tvm-1.19.1.0
Compiler inferentia_hwm-1.17.1.0
MXNet mx_neuron-1.8.0.2.4.147.0
MXNet mxnet_neuron-1.5.1.1.10.0.0
Compiler neuron-cc-1.23.5.0
Perf Tools neuronperf-1.8.93.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.11.4.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.11.4.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.11.4.0
TensorFlow tensorflow-neuron-2.10.1.2.11.4.0
TensorFlow tensorflow-neuron-2.7.4.2.11.4.0
TensorFlow tensorflow-neuron-2.8.4.2.11.4.0
TensorFlow tensorflow-neuron-2.9.3.2.11.4.0
PyTorch torch-neuron-1.10.2.2.10.12.0
PyTorch torch-neuron-1.11.0.2.10.12.0
PyTorch torch-neuron-1.12.1.2.10.12.0
PyTorch torch-neuron-1.13.1.2.10.12.0
PyTorch torch-neuron-1.9.1.2.10.12.0
Supported Python Versions for Inf1 packages#
List of packages in Neuron 2.19.0:
Package Supported Python Versions
dmlc_nnvm-1.19.1.0 3.8, 3.9, 3.10
dmlc_topi-1.19.1.0 3.8, 3.9, 3.10
dmlc_tvm-1.19.1.0 3.8, 3.9, 3.10
inferentia_hwm-1.17.1.0 3.8, 3.9, 3.10
mx_neuron-1.8.0.2.4.147.0 3.8, 3.9, 3.10
mxnet_neuron-1.5.1.1.10.0.0 3.8, 3.9, 3.10
neuron-cc-1.23.5.0 3.8, 3.9, 3.10
neuronperf-1.8.93.0 3.8, 3.9, 3.10
tensorflow-neuron-2.10.1.2.11.4.0 3.8, 3.9, 3.10
tensorflow-neuron-2.7.4.2.11.4.0 3.8, 3.9, 3.10
tensorflow-neuron-2.8.4.2.11.4.0 3.8, 3.9, 3.10
tensorflow-neuron-2.9.3.2.11.4.0 3.8, 3.9, 3.10
torch-neuron-1.10.2.2.10.12.0 3.8, 3.9, 3.10
torch-neuron-1.11.0.2.10.12.0 3.8, 3.9, 3.10
torch-neuron-1.12.1.2.10.12.0 3.8, 3.9, 3.10
torch-neuron-1.13.1.2.10.12.0 3.8, 3.9, 3.10
torch-neuron-1.9.1.2.10.12.0 3.8, 3.9, 3.10
Supported Python Versions for Inf2/Trn1 packages#
List of packages in Neuron 2.19.0:
Package Supported Python Versions
aws-neuronx-runtime-discovery-2.9 3.8, 3.9, 3.10
libneuronxla-2.0.2335 3.8, 3.9, 3.10
libneuronxla-0.5.1795 3.8, 3.9, 3.10
neuronx-cc-2.14.213.0 3.8, 3.9, 3.10
neuronx_distributed-0.8.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.10.1.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.8.4.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.9.3.2.1.0 3.8, 3.9, 3.10
torch-neuronx-1.13.1.1.15.0 3.8, 3.9, 3.10
torch-neuronx-2.1.2.2.2.0 3.8, 3.9, 3.10
torch_xla-1.13.1+torchneuronf 3.8, 3.9, 3.10
torch_xla-2.1.3 3.8, 3.9, 3.10
transformers-neuronx-0.11.351 3.8, 3.9, 3.10
Supported Numpy Versions#
Neuron supports versions >= 1.21.6 and <= 1.22.2
Supported HuggingFace Transformers Versions#
Package |
Supported HuggingFace Transformers Versions |
---|---|
torch-neuronx |
< 4.35 and >=4.37.2 |
transformers-neuronx |
>= 4.36.0 |
neuronx-distributed - Llama model class |
4.31 |
neuronx-distributed - GPT NeoX model class |
4.26 |
neuronx-distributed - Bert model class |
4.26 |
nemo-megatron |
4.31.0 |