This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
What’s New#
Neuron 2.21.1 (01/14/2025)#
Neuron 2.21.1 release pins Transformers NeuronX dependency to transformers<4.48 and fixes DMA abort errors on Trn2.
Additionally, this release addresses NxD Core and Training improvements, including fixes for sequence parallel support in quantized models and a new flag for dtype control in Llama3/3.1 70B configurations. See NxD Training Release Notes (neuronx-distributed-training) for details.
NxD Inference update includes minor bug fixes for sampling parameters. See NxD Inference Release Notes.
Neuron supported DLAMIs and DLCs have been updated to Neuron 2.21.1 SDK. Users should be aware of an incompatibility between Tensorflow-Neuron 2.10 (Inf1) and Neuron Runtime 2.21 in DLAMIs, which will be addressed in the next minor release. See Neuron DLAMI Release Notes.
The Neuron Compiler includes bug fixes and performance enhancements specifically targeting the Trn2 platform.
Neuron 2.21.0 (12/20/2024)#
What’s New#
Overview: Neuron 2.21.0 introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer (Preview). The release adds new capabilities in both training and inference of large-scale models. It introduces NxD Inference (beta), a PyTorch-based library for deployment, Neuron Profiler 2.0 (beta), and PyTorch 2.5 support across the Neuron SDK, and Logical NeuronCore Configuration (LNC) for optimizing NeuronCore allocation. The release enables Llama 3.1 405B model inference on a single trn2.48xlarge instance.
NxD Inference: NxD Inference (beta) is a new PyTorch-based inference library for deploying large-scale models on AWS Inferentia and Trainium instances. It enables PyTorch model onboarding with minimal code changes and integrates with vLLM. NxDI supports various model architectures, including Llama versions for text processing (Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 3.3), Llama 3.2 multimodal for multimodal tasks, and Mixture-of-Experts (MoE) model architectures including Mixtral and DBRX. The library supports quantization methods, includes dynamic sampling, and is compatible with HuggingFace checkpoints and generate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The release includes Llama 3.1 405B model sample and Llama 3.3 70B model sample for inference on a single trn2.48xlarge instance.
For more information, see NxD Inference documentation and check the NxD Inference Github repository: aws-neuron/neuronx-distributed-inference
Transformers NeuronX (TNx): This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds Eagle speculative decoding with greedy and lossless sampling, as well as support for CPU compilation and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.
Training Highlights: NxD Training in this release adds support for HuggingFace Llama3/3.1 70B on trn2 instances, introduces DPO support for post-training model alignment, and adds support for Mixture-of-Experts (MoE) models including Mixtral 7B. The release includes improved checkpoint conversion capabilities and supports MoE with Tensor, Sequence, Pipeline, and Expert parallelism.
ML Frameworks: Neuron 2.21.0 adds support for PyTorch 2.5 and JAX 0.4.35.
Note
The CVEs CVE-2024-31583 and CVE-2024-31580 affect PyTorch versions 2.1 and earlier. Based on Amazon’s analysis, executing models on Trainium and Inferentia is not exposed to either of these vulnerabilities. We recommend upgrading to the new version of Torch-NeuronX by following the Neuron setup instructions.
Logical NeuronCore Configuration (LNC): This release introduces LNC for Trainium2 instances, optimizing NeuronCore allocation for ML applications. LNC offers two configurations: default (LNC=2) combining two physical cores, and alternative (LNC=1) mapping each physical core individually. This feature allows users to efficiently manage resources for large-scale model training and deployment through runtime variables and compiler flags.
Neuron Profiler 2.0: The new profiler provides system and device-level profiling, timeline annotations, container integration, and support for distributed workloads. It includes trace export capabilities for Perfetto visualization and integration with JAX and PyTorch profilers, and support for Logical NeuronCore Configuration (LNC).
Neuron Kernel Interface (NKI): NKI now supports Trainium2 including Logical NeuronCore Configuration (LNC), adds SPMD capabilities for multi-core operations, and includes new modules and APIs including support for float8_e5m2 datatype.
Deep Learning Containers (DLAMIs): This release expands support for JAX 0.4 within the Multi Framework DLAMI. It also introduces NxD Training, NxD Inference, and NxD Core with PyTorch 2.5 support. Additionally, a new Single Framework DLAMI for TensorFlow 2.10 on Ubuntu 22 is now available.
Deep Learning Containers (DLCs): This release introduces new DLCs for JAX 0.4 training and PyTorch 2.5.1 inference and training. All DLCs have been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC now supports both NxD Inference and TNx libraries.
Documentation: Documentation updates include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.
Software Maintenance: This release includes the following announcements:
Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release
Announcing end of support for Neuron DET tool starting next release
PyTorch Neuron versions 1.9 and 1.10 no longer supported
Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release
Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release
Announcing end of support for Python 3.8 in future releases
Announcing end of support for Ubuntu20 DLCs and DLAMIs
Amazon Q: Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.
More release content can be found in the table below and each component release notes.
What’s New |
Details |
Instances |
---|---|---|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Transformers NeuronX (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n, Trn2 |
NxD Core (neuronx-distributed) |
Training:
|
Trn1/Trn1n,Trn2 |
NxD Inference (neuronx-distributed-inference) |
|
Inf2, Trn1/Trn1n,Trn2 |
NxD Training (neuronx-distributed-training) |
|
Trn1/Trn1n,Trn2 |
PyTorch NeuronX (torch-neuronx) |
|
Trn1/Trn1n,Inf2,Trn2 |
NeuronX Nemo Megatron for Training |
|
Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) |
|
Trn1/Trn1n,Inf2,Trn2 |
Neuron Kernel Interface (NKI) |
|
Trn1/Trn1n,Inf2 |
Neuron Deep Learning AMIs (DLAMIs) |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Deep Learning Containers (DLCs) |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Tools |
|
Inf1,Inf2,Trn1/Trn1n,Trn2 |
Neuron Runtime |
|
Inf1,Inf2,Trn1/Trn1n,Trn2 |
Release Annoucements |
Inf1, Inf2, Trn1/Trn1n |
|
Documentation Updates |
Inf1, Inf2, Trn1/Trn1n, Trn2 |
|
Minor enhancements and bug fixes. |
|
Trn1/Trn1n , Inf2, Inf1, Trn2 |
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1, Trn2 |
2.21.0 Known Issues and Limitations#
See component release notes below for any additional known issues.
Neuron 2.21.0 Beta (12/03/2024)#
Note
This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).
For access to this release (Neuron 2.21 Beta), please contact your account manager.
This release (Neuron 2.21 beta) introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.
NxD Inference, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for AXLearn training for JAX models.
The new Neuron Profiler 2.0 introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.
The documentation has been updated to include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.
Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.
Note
For the latest release that supports Trn1, Inf2 and Inf1 instances, please see Neuron Release 2.20.2
Release Artifacts#
Trn2 packages#
List of packages in Neuron 2.21.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.23.135.0
Driver aws-neuronx-dkms-2.19.64.0
nan aws-neuronx-gpsimd-customop-lib-0.13.16.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.13.2.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.23.45.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.23.45.0
OCI aws-neuronx-oci-hook-2.6.36.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.23.112.0
System Tools aws-neuronx-tools-2.20.204.0
Compiler neuronx-cc-2.16.372.0
Compiler neuronx-cc-stubs-2.16.372.0
Neuron Distributed neuronx_distributed-0.10.1
Neuron Distributed Inference neuronx_distributed_inference-0.1.1
PyTorch torch-neuronx-2.1.2.2.4.0
PyTorch torch-neuronx-2.5.1.2.4.0
PyTorch torch_xla-2.1.6
Trn1 packages#
List of packages in Neuron 2.21.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.23.135.0
Driver aws-neuronx-dkms-2.19.64.0
nan aws-neuronx-gpsimd-customop-lib-0.13.16.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.13.2.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.23.45.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.23.45.0
OCI aws-neuronx-oci-hook-2.6.36.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.23.112.0
System Tools aws-neuronx-tools-2.20.204.0
Jax jax_neuronx-0.1.2
Framework libneuronxla-2.1.714.0
Framework libneuronxla-0.5.3396
Compiler neuronx-cc-2.16.372.0
Compiler neuronx-cc-stubs-2.16.372.0
Neuron Distributed neuronx_distributed-0.10.1
Neuron Distributed Training neuronx_distributed_training-1.1.1
TensorBoard tensorboard-plugin-neuronx-2.6.52.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.12.2.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.17.0
PyTorch torch-neuronx-2.1.2.2.4.0
PyTorch torch-neuronx-2.5.1.2.4.0
PyTorch torch_xla-1.13.1+torchneurong
PyTorch torch_xla-2.1.6
Transformers Neuron transformers-neuronx-0.13.380
Inf2 packages#
List of packages in Neuron 2.21.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.23.135.0
Driver aws-neuronx-dkms-2.19.64.0
nan aws-neuronx-gpsimd-customop-lib-0.13.16.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.13.2.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.23.45.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.23.45.0
OCI aws-neuronx-oci-hook-2.6.36.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.23.112.0
System Tools aws-neuronx-tools-2.20.204.0
Jax jax_neuronx-0.1.2
Framework libneuronxla-2.1.714.0
Framework libneuronxla-0.5.3396
Compiler neuronx-cc-2.16.372.0
Compiler neuronx-cc-stubs-2.16.372.0
Neuron Distributed neuronx_distributed-0.10.1
Neuron Distributed Inference neuronx_distributed_inference-0.1.1
TensorBoard tensorboard-plugin-neuronx-2.6.52.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.12.2.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.17.0
PyTorch torch-neuronx-2.1.2.2.4.0
PyTorch torch-neuronx-2.5.1.2.4.0
PyTorch torch_xla-1.13.1+torchneurong
PyTorch torch_xla-2.1.6
Transformers Neuron transformers-neuronx-0.13.380
Inf1 packages#
List of packages in Neuron 2.21.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.12.35.0
Driver aws-neuronx-dkms-2.19.64.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.23.45.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.23.45.0
OCI aws-neuronx-oci-hook-2.6.36.0
Runtime Library aws-neuronx-runtime-lib-2.12.23.0
System Tools aws-neuronx-tools-2.20.204.0
Compiler dmlc_nnvm-1.19.6.0
Compiler dmlc_topi-1.19.6.0
Compiler dmlc_tvm-1.19.6.0
Compiler inferentia_hwm-1.17.6.0
MXNet mx_neuron-1.8.0.2.4.147.0
MXNet mxnet_neuron-1.5.1.1.10.0.0
Compiler neuron-cc-1.24.0.0
Perf Tools neuronperf-1.8.93.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.12.2.0
TensorFlow tensorflow-neuron-2.10.1.2.12.2.0
TensorFlow tensorflow-neuron-2.8.4.2.12.2.0
TensorFlow tensorflow-neuron-2.9.3.2.12.2.0
PyTorch torch-neuron-1.10.2.2.11.13.0
PyTorch torch-neuron-1.11.0.2.11.13.0
PyTorch torch-neuron-1.12.1.2.11.13.0
PyTorch torch-neuron-1.13.1.2.11.13.0
Supported Python Versions for Inf1 packages#
List of packages in Neuron 2.21.1:
Package Supported Python Versions
dmlc_nnvm-1.19.6.0 3.8, 3.9, 3.10
dmlc_topi-1.19.6.0 3.8, 3.9, 3.10
dmlc_tvm-1.19.6.0 3.8, 3.9, 3.10
inferentia_hwm-1.17.6.0 3.8, 3.9, 3.10
mx_neuron-1.8.0.2.4.147.0 3.8, 3.9, 3.10
mxnet_neuron-1.5.1.1.10.0.0 3.8, 3.9, 3.10
neuron-cc-1.24.0.0 3.8, 3.9, 3.10
neuronperf-1.8.93.0 3.8, 3.9, 3.10
tensorflow-neuron-2.10.1.2.12.2.0 3.8, 3.9, 3.10
tensorflow-neuron-2.8.4.2.12.2.0 3.8, 3.9, 3.10
tensorflow-neuron-2.9.3.2.12.2.0 3.8, 3.9, 3.10
torch-neuron-1.10.2.2.11.13.0 3.8, 3.9, 3.10
torch-neuron-1.11.0.2.11.13.0 3.8, 3.9, 3.10
torch-neuron-1.12.1.2.11.13.0 3.8, 3.9, 3.10
torch-neuron-1.13.1.2.11.13.0 3.8, 3.9, 3.10
Supported Python Versions for Inf2/Trn1/Trn2 packages#
List of packages in Neuron 2.21.1:
Package Supported Python Versions
aws-neuronx-runtime-discovery-2.9 3.8, 3.9, 3.10, 3.11
jax_neuronx-0.1.2 3.9
libneuronxla-2.1.714.0 3.8, 3.9, 3.10, 3.11
libneuronxla-0.5.3396 3.8, 3.9, 3.10
neuronx-cc-2.16.372.0 3.8, 3.9, 3.10, 3.11
neuronx-cc-stubs-2.16.372.0 3.8, 3.9, 3.10
neuronx_distributed-0.10.1 3.8, 3.9, 3.10, 3.11
neuronx_distributed_inference-0.1.1 3.8, 3.9, 3.10, 3.11
tensorflow-neuronx-2.10.1.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.8.4.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.9.3.2.1.0 3.8, 3.9, 3.10
torch-neuronx-1.13.1.1.17.0 3.8, 3.9, 3.10, 3.11
torch-neuronx-2.1.2.2.4.0 3.8, 3.9, 3.10, 3.11
torch-neuronx-2.5.1.2.4.0 3.8, 3.9, 3.10, 3.11
torch_xla-1.13.1+torchneurong 3.8, 3.9, 3.10, 3.11
torch_xla-2.1.6 3.8, 3.9, 3.10, 3.11
transformers-neuronx-0.13.380 3.8, 3.9, 3.10, 3.11
Supported Numpy Versions#
Neuron supports versions >= 1.21.6 and <= 1.22.2
Supported HuggingFace Transformers Versions#
Package |
Supported HuggingFace Transformers Versions |
---|---|
torch-neuronx |
< 4.35 and >=4.37.2 |
transformers-neuronx |
>= 4.36.0 |
neuronx-distributed - Llama model class |
4.31 |
neuronx-distributed - GPT NeoX model class |
4.26 |
neuronx-distributed - Bert model class |
4.26 |
nemo-megatron |
4.31.0 |
Previous Releases#
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2