This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
What’s New#
Neuron 2.20.1 (10/25/2024)#
Neuron 2.20.1 release addresses an issue with the Neuron Persistent Cache that was brought forth in 2.20 release. In the 2.20 release, the Neuron persistent cache issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.
This release also addresses the excessive lock wait time issue during neuron_parallel_compile graph extraction for large cluster training. See PyTorch Neuron (torch-neuronx) release notes and Neuron XLA pluggable device (libneuronxla) release notes.
Additionally, Neuron 2.20.1 introduces new Multi Framework DLAMI for Amazon Linux 2023 (AL2023) that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports. See Neuron DLAMI Release Notes.
Neuron 2.20.1 Training DLC is also updated to pre-install the necessary dependencies and support NxD Training library out of the box. See Neuron DLC Release Notes
Neuron 2.20.0 (09/16/2024)#
What’s New#
Overview: Neuron 2.20 release introduces usability improvements and new capabilities across training and inference workloads. A key highlight is the introduction of Neuron Kernel Interface (beta). NKI, pronounced ‘Nicky’, is enabling developers to build optimized custom compute kernels for Trainium and Inferentia. Additionally, this release introduces NxD Training (beta), a PyTorch-based library enabling efficient distributed training, with a user-friendly interface compatible with NeMo. This release also introduces the support for the JAX framework (beta).
Neuron 2.20 also adds inference support for Pixart-alpha and Pixart-sigma Diffusion-Transformers (DiT) models, and adds support for Llama 3.1 8B, 70B and 405B models inference supporting up to 128K context length.
Neuron Kernel Interface: NKI is a programming interface enabling developers to build optimized compute custom kernels on top of Trainium and Inferentia. NKI empowers developers to enhance deep learning models with new capabilities, performance optimizations, and scientific innovation. It natively integrates with PyTorch and JAX, providing a Python-based programming environment with Triton-like syntax and tile-level semantics, offering a familiar programming experience for developers.
All of our NKI work is shared as open source, enabling the community developers to collaborate and use these kernels in their projects, improve existing kernels, and contribute new NKI kernels. The list of kernels we are introducing includes Optimized Flash Attention NKI kernel (flash_attention
), a NKI kernel with an optimized implementation of Mamba model architecture (mamba_nki_kernels
) and Optimized Stable Diffusion Attention kernel (fused_sd_attention_small_head
). In addition to NKI kernel samples for average_pool2d
, rmsnorm
, tensor_addition
, layernorm
, transpose_2d
, and matrix_multiplication
.
For more information see NKI section and check the NKI samples Github repository: aws-neuron/nki-samples
NxD Training (NxDT): NxDT is a PyTorch-based library that adds support for user-friendly distributed training experience through a YAML configuration file compatible with NeMo,, allowing users to easily set up their training workflows. At the same time, NxDT maintains flexibility, enabling users to choose between using the YAML configuration file, PyTorch Lightning Trainer, or writing their own custom training script using the NxD Core. The library supports PyTorch model classes including Hugging Face and Megatron-LM. Additionally, it leverages NeMo’s data engineering and data science modules enabling end-to-end training workflows on NxDT, and providing compatability with NeMo through minimal changes to the YAML configuration file for models that are already supported in NxDT. Furthermore, the functionality of the Neuron NeMo Megatron (NNM) library is now part of NxDT, ensuring a smooth migration path from NNM to NxDT.
For more information see NxD Training (beta) and check the NxD Training Github repository: aws-neuron/neuronx-distributed-training
Training Highlights: This release adds support for Llama 3.1 8B and 70B model training up to 32K sequence length (beta). It also adds support for torch.autocast() for native PyTorch mixed precision support and PEFT LoRA model training.
Inference Highlights: Neuron 2.20 adds support for Llama 3.1 models (405b, 70b, and 8b variants) and introduces new features like on-device top-p sampling for improved performance, support for up to 128K context length through Flash Decoding, and multi-node inference for large models like Llama-3.1-405B. Furthermore, this release improves model loading in Transformers Neuronx for models like Llama-3 by loading the pre-sharded or pre-transformed weights and adds support to Diffusion-Transformers (DiT) models such as Pixart-alpha and Pixart-sigma.
Compiler: This release introduces Neuron Compiler support for RMSNorm and RMSNormDx operators, along with enhanced performance for the sort operator.
System Tools: As for the Neuron Tools, it enables NKI profiling support in the Neuron Profiler and introduces improvements to the Neuron Profiler UI.
Neuron Driver: This release adds support for the Rocky Linux 9.0 operating system.
Neuron Containers: This release introduces Neuron Helm Chart, which helps streamline the deployment of AWS Neuron components on Amazon EKS. See Neuron Helm Chart Github repository: aws-neuron/neuron-helm-charts. Additionaly, this release adds ECS support for the “Neuron Node Problem Detector and Recovery” artifact. See Neuron Problem Detector And Recovery.
Neuron DLAMIs and DLCs: This release includes the addition of the NxDT package to various Neuron DLAMIs (Multi-Framework Neuron DLAMI, PyTorch 1.13 Neuron DLAMI, and PyTorch 2.1 Neuron DLAMI) and the inclusion of NxDT in the PyTorch 1.13 Training Neuron DLC and PyTorch 2.1 Training Neuron DLC.
Software Maintenance Policy: This release also updates Neuron SDK software maintenance poclicy, For more information see Neuron Software Maintenance policy
More release content can be found in the table below and each component release notes.
What’s New |
Details |
Instances |
---|---|---|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Transformers NeuronX (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
NxD Core (neuronx-distributed) |
Training:
Inference:
|
Trn1/Trn1n |
NxD Training (neuronx-distributed-training) |
|
Trn1/Trn1n |
PyTorch NeuronX (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
NeuronX Nemo Megatron for Training |
|
Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) |
|
Trn1/Trn1n,Inf2 |
Neuron Kernel Interface (NKI) |
|
Trn1/Trn1n,Inf2 |
Neuron Deep Learning AMIs (DLAMIs) |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Deep Learning Containers (DLCs) |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Tools |
|
Inf1,Inf2,Trn1/Trn1n |
Neuron Runtime |
|
Inf1,Inf2,Trn1/Trn1n |
Release Annoucements |
Inf1, Inf2, Trn1/Trn1n |
|
Documentation Updates |
Inf1, Inf2, Trn1/Trn1n |
|
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
2.20.0 Known Issues and Limitations#
Known issues when using
on_device_generation
flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at Transformers Neuron (transformers-neuronx) release notesSee component release notes below for any additional known issues.
Neuron Components Release Notes#
Inf1, Trn1/Trn1n and Inf2 common packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
Neuron Runtime |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron Runtime Driver |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Neuron System Tools |
Trn1/Trn1n, Inf1, Inf2 |
|
|
Containers |
Trn1/Trn1n, Inf1, Inf2 |
|
|
NeuronPerf (Inference only) |
Trn1/Trn1n, Inf1, Inf2 |
|
|
TensorFlow Model Server Neuron |
Trn1/Trn1n, Inf1, Inf2 |
|
Trn1/Trn1n and Inf2 only packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
PyTorch Neuron |
Trn1/Trn1n, Inf2 |
|
|
TensorFlow Neuron |
Trn1/Trn1n, Inf2 |
|
|
Neuron Compiler (Trn1/Trn1n, Inf2 only) |
Trn1/Trn1n, Inf2 |
|
|
Neuron Kernel Interface (NKI) Compiler (Trn1/Trn1n, Inf2 only) |
Trn1/Trn1n, Inf2 |
|
|
Collective Communication library |
Trn1/Trn1n, Inf2 |
|
|
Neuron Custom C++ Operators |
Trn1/Trn1n, Inf2 |
|
|
Transformers Neuron |
Trn1/Trn1n, Inf2 |
|
|
NxD Training |
Trn1/Trn1n, Inf2 |
|
|
NxD Core |
Trn1/Trn1n, Inf2 |
|
|
AWS Neuron Reference for NeMo Megatron |
Trn1/Trn1n |
Inf1 only packages#
Component |
Instance/s |
Package/s |
Details |
---|---|---|---|
PyTorch Neuron |
Inf1 |
|
|
TensorFlow Neuron |
Inf1 |
|
|
Apache MXNet |
Inf1 |
|
|
Neuron Compiler (Inf1 only) |
Inf1 |
|
Release Artifacts#
Trn1 packages#
List of packages in Neuron 2.20.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.22.26.0
Driver aws-neuronx-dkms-2.18.12.0
nan aws-neuronx-gpsimd-customop-lib-0.12.2.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.12.1.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.22.4.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.22.4.0
OCI aws-neuronx-oci-hook-2.5.3.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.22.14.0
System Tools aws-neuronx-tools-2.19.0.0
Jax jax_neuronx-0.1.1
Framework libneuronxla-2.0.4986.0
Framework libneuronxla-0.5.2978
Compiler neuronx-cc-2.15.141.0
Compiler neuronx-cc-stubs-2.15.141.0
Neuron Distributed neuronx_distributed-0.9.0
Neuron Distributed Training neuronx_distributed_training-1.0.0
TensorBoard tensorboard-plugin-neuronx-2.6.63.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.12.2.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.16.0
PyTorch torch-neuronx-2.1.2.2.3.1
PyTorch torch_xla-1.13.1+torchneurong
PyTorch torch_xla-2.1.4
Transformers Neuron transformers-neuronx-0.12.313
Inf2 packages#
List of packages in Neuron 2.20.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.22.26.0
Driver aws-neuronx-dkms-2.18.12.0
nan aws-neuronx-gpsimd-customop-lib-0.12.2.0
CustomOps Tools aws-neuronx-gpsimd-tools-0.12.1.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.22.4.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.22.4.0
OCI aws-neuronx-oci-hook-2.5.3.0
General aws-neuronx-runtime-discovery-2.9
Runtime Library aws-neuronx-runtime-lib-2.22.14.0
System Tools aws-neuronx-tools-2.19.0.0
Jax jax_neuronx-0.1.1
Framework libneuronxla-2.0.4986.0
Framework libneuronxla-0.5.2978
Compiler neuronx-cc-2.15.141.0
Compiler neuronx-cc-stubs-2.15.141.0
Neuron Distributed neuronx_distributed-0.9.0
TensorBoard tensorboard-plugin-neuronx-2.6.63.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.12.2.0
TensorFlow tensorflow-neuronx-2.10.1.2.1.0
TensorFlow tensorflow-neuronx-2.8.4.2.1.0
TensorFlow tensorflow-neuronx-2.9.3.2.1.0
PyTorch torch-neuronx-1.13.1.1.16.0
PyTorch torch-neuronx-2.1.2.2.3.1
PyTorch torch_xla-1.13.1+torchneurong
PyTorch torch_xla-2.1.4
Transformers Neuron transformers-neuronx-0.12.313
Inf1 packages#
List of packages in Neuron 2.20.1:
Component Package
Collective Communication Library aws-neuronx-collectives-2.12.35.0
Driver aws-neuronx-dkms-2.18.12.0
Kubernetes Plugin aws-neuronx-k8-plugin-2.22.4.0
Kubernetes Scheduler aws-neuronx-k8-scheduler-2.22.4.0
OCI aws-neuronx-oci-hook-2.5.3.0
Runtime Library aws-neuronx-runtime-lib-2.12.23.0
System Tools aws-neuronx-tools-2.19.0.0
Compiler dmlc_nnvm-1.19.6.0
Compiler dmlc_topi-1.19.6.0
Compiler dmlc_tvm-1.19.6.0
Compiler inferentia_hwm-1.17.6.0
MXNet mx_neuron-1.8.0.2.4.147.0
MXNet mxnet_neuron-1.5.1.1.10.0.0
Compiler neuron-cc-1.24.0.0
Perf Tools neuronperf-1.8.93.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.10.1.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.9.3.2.12.2.0
TensorFlow Model Server tensorflow-model-server-neuronx-2.8.4.2.12.2.0
TensorFlow tensorflow-neuron-2.10.1.2.12.2.0
TensorFlow tensorflow-neuron-2.8.4.2.12.2.0
TensorFlow tensorflow-neuron-2.9.3.2.12.2.0
PyTorch torch-neuron-1.10.2.2.11.7.0
PyTorch torch-neuron-1.11.0.2.11.7.0
PyTorch torch-neuron-1.12.1.2.11.7.0
PyTorch torch-neuron-1.13.1.2.11.7.0
PyTorch torch-neuron-1.9.1.2.11.7.0
Supported Python Versions for Inf1 packages#
List of packages in Neuron 2.20.1:
Package Supported Python Versions
dmlc_nnvm-1.19.6.0 3.8, 3.9, 3.10
dmlc_topi-1.19.6.0 3.8, 3.9, 3.10
dmlc_tvm-1.19.6.0 3.8, 3.9, 3.10
inferentia_hwm-1.17.6.0 3.8, 3.9, 3.10
mx_neuron-1.8.0.2.4.147.0 3.8, 3.9, 3.10
mxnet_neuron-1.5.1.1.10.0.0 3.8, 3.9, 3.10
neuron-cc-1.24.0.0 3.8, 3.9, 3.10
neuronperf-1.8.93.0 3.8, 3.9, 3.10
tensorflow-neuron-2.10.1.2.12.2.0 3.8, 3.9, 3.10
tensorflow-neuron-2.8.4.2.12.2.0 3.8, 3.9, 3.10
tensorflow-neuron-2.9.3.2.12.2.0 3.8, 3.9, 3.10
torch-neuron-1.10.2.2.11.7.0 3.8, 3.9, 3.10
torch-neuron-1.11.0.2.11.7.0 3.8, 3.9, 3.10
torch-neuron-1.12.1.2.11.7.0 3.8, 3.9, 3.10
torch-neuron-1.13.1.2.11.7.0 3.8, 3.9, 3.10
torch-neuron-1.9.1.2.11.7.0 3.8, 3.9, 3.10
Supported Python Versions for Inf2/Trn1 packages#
List of packages in Neuron 2.20.1:
Package Supported Python Versions
aws-neuronx-runtime-discovery-2.9 3.8, 3.9, 3.10, 3.11
jax_neuronx-0.1.1 3.9
libneuronxla-2.0.4986.0 3.8, 3.9, 3.10, 3.11
libneuronxla-0.5.2978 3.8, 3.9, 3.10
neuronx-cc-2.15.141.0 3.8, 3.9, 3.10, 3.11
neuronx-cc-stubs-2.15.141.0 3.8, 3.9, 3.10
neuronx_distributed-0.9.0 3.8, 3.9, 3.10, 3.11
tensorflow-neuronx-2.10.1.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.8.4.2.1.0 3.8, 3.9, 3.10
tensorflow-neuronx-2.9.3.2.1.0 3.8, 3.9, 3.10
torch-neuronx-1.13.1.1.16.0 3.8, 3.9, 3.10, 3.11
torch-neuronx-2.1.2.2.3.1 3.8, 3.9, 3.10, 3.11
torch_xla-1.13.1+torchneurong 3.8, 3.9, 3.10, 3.11
torch_xla-2.1.4 3.8, 3.9, 3.10, 3.11
transformers-neuronx-0.12.313 3.8, 3.9, 3.10, 3.11
Supported Numpy Versions#
Neuron supports versions >= 1.21.6 and <= 1.22.2
Supported HuggingFace Transformers Versions#
Package |
Supported HuggingFace Transformers Versions |
---|---|
torch-neuronx |
< 4.35 and >=4.37.2 |
transformers-neuronx |
>= 4.36.0 |
neuronx-distributed - Llama model class |
4.31 |
neuronx-distributed - GPT NeoX model class |
4.26 |
neuronx-distributed - Bert model class |
4.26 |
nemo-megatron |
4.31.0 |
Supported Probuf Versions#
Package |
Supported Probuf versions |
---|---|
neuronx-cc |
> 3 |
torch-neuronx |
>= 3.20 |
torch-neuron |
< 3.20 |
transformers-neuronx |
>= 3.20 |
neuronx-distributed |
>= 3.20 |
tensorflow-neuronx |
< 3.20 |
tensorflow-neuron |
< 3.20 |
Supported Linux Kernel Versions#
Neuron Driver (aws-neuronx-dkms
) supports Linux kernel versions >= 5.10
Previous Releases#
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n