This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Previous Releases Notes (Neuron 2.x)#

Neuron 2.19.1 (07/19/2024)#

This release (Neuron 2.19.1) addresses an issue with the Neuron Persistent Cache that was introduced in the previous release, Neuron 2.19. The issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.

Neuron 2.19.0 (07/03/2024)#

What’s New#

Neuron 2.19 release adds Llama 3 training support and introduces Flash Attention kernel support to enable LLM training and inference for large sequence lengths. Neuron 2.19 also introduces new features and performance improvements to LLM training, improves LLM inference performance for Llama 3 model by upto 20%, and adds tools for monitoring, problem detection and recovery in Kubernetes (EKS) environments, improving efficiency and reliability.

Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by support for Flash Attention to enable training with longer sequence lengths >= 8K. Neuron 2.19 adds support for Llama 3 model training. This release also adds support for Interleaved pipeline parallelism to reduce idle time (bubble size) and enhance training efficiency and resource utilization for large cluster sizes.

Inference highlights: Flash Attention kernel support in the Transformers NeuronX library enables LLM inference for context lengths of up to 32k. This release also adds [Beta] support for continuous batching with mistralai/Mistral-7B-v0.2 in Transformers NeuronX.

Tools and Neuron DLAMI/DLC highlights: This release introduces the new Neuron Node Problem Detector and Recovery plugin in EKS supported Kubernetes environments:a tool to monitor the health of Neuron instances and triggers automatic node replacement upon detecting an unrecoverable error. Neuron 2.19 introduces the new Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes, and adds monitoring support with Prometheus and Grafana. This release also introduces new PyTorch 2.1 and PyTorch 1.13 single framework DLAMIs for Ubuntu 22. Neuron DLAMIs and Neuron DLCs are also updated to support this release (Neuron 2.19).

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Training

Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

PyTorch NeuronX (torch-neuronx)

  • Support for FP32 master weights and BF16 all-gather during Zero1 training to enhance training efficiency.

  • Support to add custom SILU activation functions by configuring NEURON_CUSTOM_SILU variable

  • See more at PyTorch Neuron (torch-neuronx) release notes

Trn1/Trn1n,Inf2

NeuronX Nemo Megatron for Training

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Trn1/Trn1n,Inf2

Neuron DLAMI and DLC

Inf1,Inf2,Trn1/Trn1n

Neuron Tools

  • Support for new Neuron Node Problem Detector and Recovery plugin in EKS supported kubernetes environments that monitors health of Neuron instances and triggers automatic node replacement upon detecting an unrecoverable error. See configuration and tutorial.

  • Support for new Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See tutorial

  • Support for Neuron scheduler extension to enforce allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See tutorial

  • Neuron Profiler bugfixes and UI updates, including improvements to visualizing collective operations and to the consistency of information being displayed

  • Added memory usage metrics and device count information to neuron-monitor

  • See more at Neuron System Tools

Inf1,Inf2,Trn1/Trn1n

Neuron Runtime

  • Support for dynamic Direct Memory Access (DMA) that reduces memory usage during runtime.

  • Runtime Enhancements that improve collectives performance

  • See more at Neuron Runtime Release Notes

Inf1,Inf2,Trn1/Trn1n

Other Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

2.19.0 Known Issues and Limitations#

  • Known issues when using on_device_generation flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at Transformers Neuron (transformers-neuronx) release notes

  • See component release notes below for any additional known issues.

Neuron Components Release Notes#

Inf1, Trn1/Trn1n and Inf2 common packages#

Component

Instance/s

Package/s

Details

Neuron Runtime

Trn1/Trn1n, Inf1, Inf2

  • Trn1/Trn1n: aws-neuronx-runtime-lib (.deb, .rpm)

  • Inf1: Runtime is linked into the ML frameworks packages

Neuron Runtime Driver

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-dkms (.deb, .rpm)

Neuron System Tools

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-tools (.deb, .rpm)

Neuron DLAMI

Trn1/Trn1n, Inf1, Inf2

Neuron DLC

Trn1/Trn1n, Inf1, Inf2

Containers

Trn1/Trn1n, Inf1, Inf2

  • aws-neuronx-k8-plugin (.deb, .rpm)

  • aws-neuronx-k8-scheduler (.deb, .rpm)

  • aws-neuronx-oci-hooks (.deb, .rpm)

NeuronPerf (Inference only)

Trn1/Trn1n, Inf1, Inf2

  • neuronperf (.whl)

TensorFlow Model Server Neuron

Trn1/Trn1n, Inf1, Inf2

  • tensorflow-model-server-neuronx (.deb, .rpm)

Neuron Documentation

Trn1/Trn1n, Inf1, Inf2

Trn1/Trn1n and Inf2 only packages#

Component

Instance/s

Package/s

Details

PyTorch Neuron

Trn1/Trn1n, Inf2

  • torch-neuronx (.whl)

TensorFlow Neuron

Trn1/Trn1n, Inf2

  • tensorflow-neuronx (.whl)

Neuron Compiler (Trn1/Trn1n, Inf2 only)

Trn1/Trn1n, Inf2

  • neuronx-cc (.whl)

Collective Communication library

Trn1/Trn1n, Inf2

  • aws-neuronx-collective (.deb, .rpm)

Neuron Custom C++ Operators

Trn1/Trn1n, Inf2

  • aws-neuronx-gpsimd-customop (.deb, .rpm)

  • aws-neuronx-gpsimd-tools (.deb, .rpm)

Transformers Neuron

Trn1/Trn1n, Inf2

  • transformers-neuronx (.whl)

Neuron Distributed

Trn1/Trn1n, Inf2

  • neuronx-distributed (.whl)

AWS Neuron Reference for NeMo Megatron

Trn1/Trn1n

Note

In next releases aws-neuronx-tools and aws-neuronx-runtime-lib will add support for Inf1.

Inf1 only packages#

Component

Instance/s

Package/s

Details

PyTorch Neuron

Inf1

  • torch-neuron (.whl)

TensorFlow Neuron

Inf1

  • tensorflow-neuron (.whl)

Apache MXNet

Inf1

  • mx_neuron (.whl)

Neuron Compiler (Inf1 only)

Inf1

  • neuron-cc (.whl)

Neuron 2.18.2 (04/25/2024)#

Patch release with minor Neuron Compiler bug fixes and enhancements. See more in Neuron Compiler (neuronx-cc) release notes

Neuron 2.18.1 (04/10/2024)#

Neuron 2.18.1 release introduces Continuous batching(beta) and Neuron vLLM integration(beta) support in Transformers NeuronX library that improves LLM inference throughput. This release also fixes hang issues related to Triton Inference Server as well as updating Neuron DLAMIs and DLCs with this release(2.18.1). See more in Transformers Neuron (transformers-neuronx) release notes and Neuron Compiler (neuronx-cc) release notes

Neuron 2.18.0 (04/01/2024)#

What’s New#

Neuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).

Training highlights: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).

Inference highlights: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature. This release also adds a new sample for Mixtral-8x7B-v0.1 and mistralai/Mistral-7B-Instruct-v0.2 in TNx.

Neuron DLAMI and Neuron DLC support highlights: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Training

Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

PyTorch NeuronX (torch-neuronx)

Trn1/Trn1n,Inf2

NeuronX Nemo Megatron for Training

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Trn1/Trn1n,Inf2

Neuron DLAMI and DLC

  • New Neuron Multi Framework Deep Learning AMI (DLAMI) for Ubuntu 22 with separate virtual environments for PyTorch 2.1, PyTorch 1.13, Transformers NeuronX and Tensorflow 2.10. See setup guide and Neuron DLAMI User Guide

  • Neuron Multi Framework Deep Learning AMI (DLAMI) is now the default Neuron AMI in QuickStart AMI list when launching Neuron instances for Ubuntu through AWS console. See setup guide

  • Neuron DLAMIs for PyTorch 1.13 and Tensorflow 2.10 are updated with 2.18 Neuron SDK for both Ubuntu 20 and AL2. See Neuron DLAMI User Guide

  • SSM parameter support for Neuron DLAMIs to find the DLAMI id with latest Neuron release SDK. See Neuron DLAMI User Guide

  • New Neuron Deep Learning Containers(DLCs) for PyTorch 2.1 Inference and Training. See Neuron Containers

  • PyTorch 1.13 Inference and Training DLCs are updated with latest 2.18 Neuron SDK and now also comes with pre-installed NeuronX Distributed library. See Neuron Containers

  • Neuron DLCs are now hosted both in public Neuron ECR and as private images. Private images are only needed when using with Sagemaker. See Neuron Containers

  • New Neuron Github Repository to host dockerfiles for Neuron DLCs. See neuron deep learning containers github repo

Inf1,Inf2,Trn1/Trn1n

Other Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

2.18.0 Known Issues and Limitations#

  • For PyTorch 2.1 (NeuronX), slow convergence for LLaMA-2 70B training when using Zero Redundancy Optimizer (ZeRO1) can be resolved by removing all compiler flags.

  • For PyTorch 2.1 (NeuronX), torch-xla 2.1 is incompatible with the default GLibC on AL2. Users are advised to migrate to Amazon Linux 2023 , Ubuntu 22 or Ubuntu 20 Operating systems.

  • See component release notes below for any additional known issues.

Neuron 2.17.0 (02/13/2024)#

What’s New#

Neuron 2.17 release improves small collective communication operators (smaller than 16MB) by up to 30%, which improves large language model (LLM) Inference performance by up to 10%. This release also includes improvements in Neuron Profiler and other minor enhancements and bug fixes.

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

Neuron 2.16.1 (01/18/2024)#

Patch release with compiler bug fixes, updates to Neuron Device Plugin and Neuron Kubernetes Scheduler .

Neuron 2.16.0 (12/21/2023)#

What’s New#

Neuron 2.16 adds support for Llama-2-70B training and inference, upgrades to PyTorch 2.1 (beta) and adds new support for PyTorch Lightning Trainer (beta) as well as performance improvements and adding Amazon Linux 2023 support.

Training highlights: NeuronX Distributed library LLM models training performance is improved by up to 15%. LLM model training user experience is improved by introducing support of PyTorch Lightning Trainer (beta), and a new model optimizer wrapper which will minimize the amount of changes needed to partition models using NeuronX Distributed primitives.

Inference highlights: PyTorch inference now allows to dynamically swap different fine-tuned weights for an already loaded model, as well as overall improvements of LLM inference throughput and latency with Transformers NeuronX. Two new reference model samples for LLama-2-70b and Mistral-7b model inference.

User experience: This release introduces two new capabilities: A new tool, Neuron Distributed Event Tracing (NDET) which improves debuggability, and the support of profiling collective communication operators in the Neuron Profiler tool.

More release content can be found in the table below and each component release notes.

What’s New

Details

Instances

Transformers NeuronX (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Training

Trn1/Trn1n

NeuronX Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

PyTorch NeuronX (torch-neuronx)

Trn1/Trn1n,Inf2

Neuron Tools

Inf1/Inf2/Trn1/Trn1n

Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

2.16.0 Known Issues and Limitations#

  • We recommend running multi-node training jobs on AL2023 using Amazon EKS. Parallel Cluster currently does not support AL2023.

  • There are known compiler issues impacting inference accuracy of certain model configurations of Llama-2-13b when amp = fp16 is used. If this issue is observed, amp=fp32 should be used as a work around. This issue will be addressed in future Neuron releases.

  • Execution time reported in neuron-profile tool is sometimes in-accurate due to a bug in how the time is captured. The bug will be addressed in upcoming Neuron releases.

  • See component release notes below for any additional known issues.

Neuron 2.15.2 (11/17/2023)#

Patch release that fixes compiler issues related to performance when training using neuronx-nemo-megatron library.

Neuron 2.15.1 (11/09/2023)#

Patch release to fix execution overhead issues in Neuron Runtime that were inadvertently introduced in 2.15 release.

Neuron 2.15.0 (10/26/2023)#

What’s New#

This release adds support for PyTorch 2.0 (Beta), increases performance for both training and inference workloads, adding ability to train models like Llama-2-70B using neuronx-distributed. With this release, we are also adding pipeline parallelism support for neuronx-distributed enabling full 3D parallelism support to easily scale training to large model sizes. Neuron 2.15 also introduces support for training resnet50, milesial/Pytorch-UNet and deepmind/vision-perceiver-conv models using torch-neuronx, as well as new sample code for flan-t5-xl model inference using neuronx-distributed, in addition to other performance optimizations, minor enhancements and bug fixes.

What’s New

Details

Instances

Neuron Distributed (neuronx-distributed) for Training

Trn1/Trn1n

Neuron Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

Transformers Neuron (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

PyTorch Neuron (torch-neuronx)

Trn1/Trn1n,Inf2

AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron)

Trn1/Trn1n

Neuron Compiler (neuronx-cc)

Inf2/Trn1/Trn1n

Neuron Tools

  • alltoall Collective Communication operation for intra node(with in the instance), previously released in Neuron Collectives v2.15.13, was added as a testable operation in nccom-test. See NCCOM-TEST User Guide

  • See more at Neuron System Tools

Inf1/Inf2/Trn1/Trn1n

Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

Neuron 2.14.1 (09/26/2023)#

This is a patch release that fixes compiler issues in certain configurations of Llama and Llama-2 model inference using transformers-neuronx.

Note

There is still a known compiler issue for inference of some configurations of Llama and Llama-2 models that will be addressed in future Neuron release. Customers are advised to use --optlevel 1 (or -O1) compiler flag to mitigate this known compiler issue.

See Neuron Compiler CLI Reference Guide (neuronx-cc) on the usage of --optlevel 1 compiler flag. Please see more on the compiler fix and known issues in Neuron Compiler (neuronx-cc) release notes and Transformers Neuron (transformers-neuronx) release notes

Neuron 2.14.0 (09/15/2023)#

What’s New#

This release introduces support for Llama-2-7B model training and T5-3B model inference using neuronx-distributed. It also adds support for Llama-2-13B model training using neuronx-nemo-megatron. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base) model inference using torch-neuronx . This release also introduces other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

Note

This release deprecates --model-type=transformer-inference compiler flag. Users are highly encouraged to migrate to the --model-type=transformer compiler flag.

What’s New

Details

Instances

AWS Neuron Reference for Nemo Megatron library (neuronx-nemo-megatron)

Trn1/Trn1n

Neuron Distributed (neuronx-distributed) for Training

Trn1/Trn1n

Neuron Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

Transformers Neuron (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

PyTorch Neuron (torch-neuronx)

Trn1/Trn1n,Inf2

Neuron Compiler (neuronx-cc)

Inf2/Trn1/Trn1n

Neuron Tools

Inf1/Inf2/Trn1/Trn1n

Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

Neuron 2.13.2 (09/01/2023)#

This is a patch release that fixes issues in Kubernetes (K8) deployments related to Neuron Device Plugin crashes and other pod scheduling issues. This release also adds support for zero-based Neuron Device indexing in K8 deployments, see the Neuron K8 release notes for more details on the specific bug fixes.

Updating to latest Neuron Kubernetes components and Neuron Driver is highly encouraged for customers using Kubernetes.

Please follow these instructions in setup guide to upgrade to latest Neuron release.

Neuron 2.13.1 (08/29/2023)#

This release adds support for Llama 2 model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2 model inference using transformers-neuronx library (tutorial) .

Please follow these instructions in setup guide to upgrade to latest Neuron release.

Note

Please install transformers-neuronx from https://pip.repos.neuron.amazonaws.com to get latest features and improvements.

This release does not support LLama 2 model with Grouped-Query Attention

Neuron 2.13.0 (08/28/2023)#

What’s New#

This release introduces support for GPT-NeoX 20B model training in neuronx-distributed including Zero-1 optimizer capability. It also adds support for Stable Diffusion XL and CLIP models inference in torch-neuronx. Neuron 2.13 also introduces AWS Neuron Reference for Nemo Megatron library supporting distributed training of LLMs like GPT-3 175B. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New

Details

Instances

AWS Neuron Reference for Nemo Megatron library

Trn1/Trn1n

Transformers Neuron (transformers-neuronx) for Inference

Inf2, Trn1/Trn1n

Neuron Distributed (neuronx-distributed) for Training

Trn1/Trn1n

Neuron Distributed (neuronx-distributed) for Inference

Inf2,Trn1/Trn1n

PyTorch Neuron (torch-neuronx)

Trn1/Trn1n,Inf2

Neuron Tools

  • New data types support for Neuron Collective Communication Test Utility (NCCOM-TEST) –check option: fp16, bf16, (u)int8, (u)int16, and (u)int32

  • Neuron SysFS support for FLOP count(flop_count) and connected Neuron Device ids (connected_devices). See Neuron Sysfs User Guide

  • See more at Neuron System Tools

Inf1/Inf2/Trn1/Trn1n

Neuron Runtime

  • Runtime version and Capture Time support to NTFF

  • Async DMA copies support to improve Neuron Device copy times for all instance types

  • Logging and error messages improvements for Collectives timeouts and when loading NEFFs.

  • See more at Neuron Runtime Release Notes

Inf1, Inf2, Trn1/Trn1n

End of Support Announcements and Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

2.13.0 Known Issues and Limitations#

  • Currently we see a NaN generated when the model implementation uses torch.dtype(float32.min) or torch.dtype(float32.max) along with XLA_USE_BF16/XLA_DOWNCAST_BF16. This is because, float32.min or float32.max gets downcasted to Inf in bf16 thereby producing a NaN. Short term fix is that we can use a small/large fp32 number instead of using float32.min/float32.max. Example, for mask creation, we can use -/+1e4 instead of min/max values. The issue will be addressed in future Neuron releases.

Neuron 2.12.2 (08/19/2023)#

Patch release to fix a jemalloc conflict for all Neuron customers that use Ubuntu 22. The previous releases shipped with a dependency on jemalloc that may lead to compilation failures in Ubuntu 22 only. Please follow these instructions in setup guide to upgrade to latest Neuron release.

Neuron 2.12.1 (08/09/2023)#

Patch release to improve reliability of Neuron Runtime when running applications on memory constrained instances. The Neuron Runtime has reduced the contiguous memory requirement for initializing the Neuron Cores associated with applications. This reduction allows bringup when only small amounts of contiguous memory remain on an instance. Please upgrade to latest Neuron release to use the latest Neuron Runtime.

Neuron 2.12.0 (07/19/2023)#

What’s New#

This release introduces ZeRO-1 optimizer for model training in torch-neuronx , introduces beta support for GPT-NeoX, BLOOM , Llama and Llama 2(coming soon) models in transformers-neuronx. This release also adds support for model inference serving on Triton Inference Server for Inf2 & Trn1 instances, lazy_load API and async_load API for model loading in torch-neuronx, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New

Details

Instances

ZeRO-1 optimizer for model training in torch-neuronx

  • Support of ZeRO-Stage-1 optimizer ( ZeroRedundancyOptimizer() API) for training models using torch-neuronx

  • See tutorial at ZeRO-1 Tutorial

Inf2, Trn1/Trn1n

Support for new models and Enhancements in transformers-neuronx

Inf2, Trn1/Trn1n

Support for Inf2 and Trn1 instances on Triton Inference Server

Inf2, Trn1

Support for new computer vision models

  • Performance optimizations in Stable Diffusion 2.1 model script and added [beta] support for Stable Diffusion 1.5 models.

  • [Beta] Script for training CLIP model for Image Classification.

  • [Beta] Script for inference of Multimodal perceiver model

  • Please check aws-neuron-samples repository

Inf2, Trn1/Trn1n

New Features in neuronx-distributed for training

  • Added parallel cross entropy loss function.

  • See more at tp_api_guide

Trn1/Trn1n

lazy_load and async_load API for model loading in inference and performance enhancements in torch-neuronx

Inf2, Trn1/Trn1n

[Beta] Asynchronous Execution support and Enhancements in Neuron Runtime

  • Added beta asynchronous execution feature which can reduce latency by roughly 12% for training workloads. See more at NeuronX Runtime Configuration

  • AllReduce with All-to-all communication pattern enabled for 16 ranks on TRN1/TRN1N within the instance (intranode)

  • See more at Neuron Runtime Release Notes

Inf1, Inf2, Trn1/Trn1n

Support for distribution_strategy compiler option in neuronx-cc

Inf2, Trn1/Trn1n

New Micro Benchmarking Performance User Guide and Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Known Issues and Limitations

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

2.12.0 Known Issues and Limitations#

Known Issues in Ubuntu 22 Support#

  • Several Vision and NLP models on Ubuntu 22 are not supported due to Compilation issues. Issues will be addressed in upcoming releases.

  • CustomOp feature failing with seg fault on Ubuntu 22. Issue will be addressed in upcoming releases.

Known issues in certain resnet models on Ubuntu 20#

  • Known issue with support for resnet-18, resnet-34, resnet-50, resnet-101 and resnet-152 models on Ubuntu 20. Issues will be addressed in upcoming releases.

Neuron 2.11.0 (06/14/2023)#

What’s New#

This release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New

Details

Instances

New Features and Performance Enhancements in transformers-neuronx

Inf2, Trn1/Trn1n

Neuron Profiler Tool

  • Profiling and visualization of model execution on Trainium and Inferentia devices now supported as a stand-alone tool.

  • See more at Neuron Profile User Guide

Inf1, Inf2, Trn1/Trn1n

Neuron Compilation Cache through S3

Inf2, Trn1/Trn1n

New script to scan a model for supported/unsupported operators

  • Script to scan a model for supported/unsupported operators before training, scan output includes supported and unsupported operators at both XLA operators and PyTorch operators level.

  • See a sample tutorial at Analyze for Training Tutorial

Inf2, Trn1/Trn1n

Neuron Distributed Library [Beta]

  • New Python Library based on PyTorch enabling distributed training and inference of large models.

  • Initial support for tensor-parallelism.

  • See more at NxD Core

Inf2, Trn1/Trn1n

Neuron Calculator and Documentation Updates

Inf1, Inf2, Trn1/Trn1n

Enhancements to Neuron SysFS

Inf1, Inf2, Trn1/Trn1n

Support for Ubuntu 22

  • See more at Setup Guide for setup instructions on Ubuntu22

Inf1, Inf2, Trn1/Trn1n

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

Neuron 2.10.0 (05/01/2023)#

What’s New#

This release introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New

Details

Instances

Initial support for computer vision models inference

  • Added Stable Diffusion 2.1 model script for Text to Image Generation

  • Added VGG model script for Image Classification Task

  • Added UNet model script for Image Segmentation Task

  • Please check aws-neuron-samples repository

Inf2, Trn1/Trn1n

Profiling support in PyTorch Neuron(torch-neuronx) for Inference with TensorBoard

Inf2, Trn1/Trn1n

New Features and Performance Enhancements in transformers-neuronx

Inf2, Trn1/Trn1n

Support models larger than 2GB in TensorFlow 2.x Neuron (tensorflow-neuronx)

Trn1/Trn1n, Inf2

Support models larger than 2GB in TensorFlow 2.x Neuron (tensorflow-neuron)

Inf1

Performance Enhancements in PyTorch C++ Custom Operators (Beta)

Trn1/Trn1n

Weight Deduplication Feature (Inf1)

  • Support for Sharing weights when loading multiple instance versions of the same model on different NeuronCores.

  • See more at NeuronX Runtime Configuration

Inf1

nccom-test - Collective Communication Benchmarking Tool

  • Supports enabling benchmarking sweeps on various Neuron Collective Communication operations. See NCCOM-TEST User Guide for more details.

Trn1/Trn1n , Inf2

Announcing end of support for tensorflow-neuron 2.7 & mxnet-neuron 1.5 versions

Inf1

Minor enhancements and bug fixes.

Trn1/Trn1n , Inf2, Inf1

Release Artifacts

Trn1/Trn1n , Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

Neuron 2.9.1 (04/19/2023)#

Minor patch release to add support for deserialized torchscript model compilation and support for multi-node training in EKS. Fixes included in this release are critical to enable training and deploying models with Amazon Sagemaker or Amazon EKS.

Neuron 2.9.0 (03/28/2023)#

What’s New#

This release adds support for EC2 Trn1n instances, introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:

What’s New

Details

Instances

Support for EC2 Trn1n instances

  • Updated Neuron Runtime for Trn1n instances

  • Overall documentation update to include Trn1n instances

Trn1n

New Analyze API in PyTorch Neuron (torch-neuronx)

Trn1, Inf2

Support models that are larger than 2GB in PyTorch Neuron (torch-neuron) on Inf1

Inf1

Performance Improvements

  • Up to 10% higher throughput when training GPT3 6.7B model on multi-node

Trn1

Dynamic Batching support in TensorFlow 2.x Neuron (tensorflow-neuronx)

Trn1, Inf2

NeuronPerf support for Trn1/Inf2 instances

  • Added Trn1/Inf2 support for PyTorch Neuron (torch-neuronx) and TensorFlow 2.x Neuron (tensorflow-neuronx)

Trn1, Inf2

Hierarchical All-Reduce and Reduce-Scatter collective communication

  • Added support for hierarchical All-Reduce and Reduce-Scatter in Neuron Runtime to enable better scalability of distributed workloads .

Trn1, Inf2

New Tutorials added

Trn1, Inf2

Minor enhancements and bug fixes.

Trn1, Inf2, Inf1

Release included packages

Trn1, Inf2, Inf1

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.

Neuron 2.8.0 (02/24/2023)#

What’s New#

This release adds support for EC2 Inf2 instances, introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.

This release introduces the following:

What’s New

Details

Support for EC2 Inf2 instances

  • Inference support for Inf2 instances in PyTorch Neuron (torch-neuronx)

  • Inference support for Inf2 instances in TensorFlow 2.x Neuron (tensorflow-neuronx)

  • Overall documentation update to include Inf2 instances

TensorFlow 2.x Neuron (tensorflow-neuronx) support

  • This releases introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx) on Trn1 and Inf2

New Neuron GitHub samples

Minor enhancements and bug fixes.

Release included packages

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

Neuron 2.7.0 (02/08/2023)#

What’s New#

This release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:

What’s New

Details

PyTorch 1.13

Support of PyTorch 1.13 version for PyTorch Neuron (torch-neuronx). For resources see PyTorch Neuron

PyTorch DistributedDataParallel (DDP) API

Support of PyTorch DistributedDataParallel (DDP) API in PyTorch Neuron (torch-neuronx). For resources how to use PyTorch DDP API with Neuron, please check neuronx-ddp-tutorial.

Inference support in torch-neuronx

For more details please visit pytorch-neuronx-main` page. You can also try Neuron Inference samples aws-neuron/aws-neuron-samples in the aws-neuron-samples GitHub repo.

Neuron Custom C++ Operators[Beta]

Initial support for Neuron Custom C++ Operators [Beta] , with Neuron Custom C++ Operators (“CustomOps”) you can now write CustomOps that run on NeuronCore-v2 chips. For more resources please check Neuron Custom C++ Operators [Beta] section.

transformers-neuronx [Beta]

transformers-neuronx is a new library enabling LLM model inference. It contains models that are checkpoint-compatible with HuggingFace Transformers, and currently supports Transformer Decoder models like GPT2, GPT-J and OPT. Please check aws-neuron-samples repository

Neuron sysfs filesystem

Neuron sysfs filesystem exposes Neuron Devices under /sys/devices/virtual/neuron_device providing visibility to Neuron Driver and Runtime at the system level. By performing several simple CLIs such as reading or writing to a sysfs file, you can get information such as Neuron Runtime status, memory usage, Driver info etc. For resources about Neuron sysfs filesystem visit Neuron Sysfs User Guide.

TFLOPS support in Neuron System Tools

Neuron System Tools now also report model actual TFLOPs rate in both neuron-monitor and neuron-top. More details can be found in the Neuron Tools documentation.

New sample scripts for training

This release adds multiple new sample scripts for training models with torch-neuronx, Please check aws-neuron-samples repository

New sample scripts for inference

This release adds multiple new sample scripts for deploying models with torch-neuronx, Please check aws-neuron-samples repository

Neuron GitHub samples repository for Amazon EKS

A new AWS Neuron GitHub samples repository for Amazon EKS, Please check aws-neuron-samples repository

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

Neuron 2.6.0 (12/12/2022)#

This release introduces the support of PyTorch 1.12 version, and introduces PyTorch Neuron (torch-neuronx) profiling through Neuron Plugin for TensorBoard. Pytorch Neuron (torch-neuronx) users can now profile their models through the following TensorBoard views:

  • Operator Framework View

  • Operator HLO View

  • Operator Trace View

This release introduces the support of LAMB optimizer for FP32 mode, and adds support for capturing snapshots of inputs, outputs and graph HLO for debugging.

In addition, this release introduces the support of new operators and resolves issues that improve stability for Trn1 customers.

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

Neuron 2.5.0 (11/23/2022)#

Neuron 2.5.0 is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.

Component

New in this release

PyTorch Neuron (torch-neuron)

TensorFlow Neuron (tensorflow-neuron)

  • tf-neuron-auto-multicore tool to enable automatic data parallel on multiple NeuronCores.

  • Beta support for tracing models larger than 2GB using extract-weights flag (TF2.x only), see TensorFlow 2.x (tensorflow-neuron) Tracing API

  • tfn.auto_multicore Python API to enable automatic data parallel (TF2.x only)

This Neuron release is the last release that will include torch-neuron versions 1.7 and 1.8, and that will include tensorflow-neuron versions 2.5 and 2.6.

In addition, this release introduces changes to the Neuron packaging and installation instructions for Inf1 customers, see Introducing Neuron packaging and installation changes for Inf1 customers for more information.

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

Neuron 2.4.0 (10/27/2022)#

This release introduces new features and resolves issues that improve stability. The release introduces “memory utilization breakdown” feature in both Neuron Monitor and Neuron Top system tools. The release introduces support for “NeuronCore Based Sheduling” capability to the Neuron Kubernetes Scheduler and introduces new operators support in Neuron Compiler and PyTorch Neuron. This release introduces also additional eight (8) samples of models’ fine tuning using PyTorch Neuron. The new samples can be found in the AWS Neuron Samples GitHub repository.

Neuron 2.3.0 (10/10/2022)#

This Neuron 2.3.0 release extends Neuron 1.x and adds support for the new AWS Trainium powered Amazon EC2 Trn1 instances. With this release, you can now run deep learning training workloads on Trn1 instances to save training costs by up to 50% over equivalent GPU-based EC2 instances, while getting the highest training performance in AWS cloud for popular NLP models.

What’s New

Tested workloads and known issues

  • rn2.3.0_tested

  • rn2.3.0-known-issues

Neural-networks training support
PyTorch Neuron (torch-neuronx)
Neuron Runtime, Drivers and Networking Components
Neuron Tools
Developer Flows

The following workloads were tested in this release:

  • Distributed data-parallel pre-training of Hugging Face BERT model on single Trn1.32xl instance (32 NeuronCores).

  • Distributed data-parallel pre-training of Hugging Face BERT model on multiple Trn1.32xl instances.

  • HuggingFace BERT MRPC task finetuning on single NeuronCore or multiple NeuronCores (data-parallel).

  • Megatron-LM GPT3 (6.7B parameters) pre-training on single Trn1.32xl instance.

  • Megatron-LM GPT3 (6.7B parameters) pre-training on multi Trn1.32xl instances.

  • Multi-Layer Perceptron (ML) model training on single NeuronCore or multiple NeuronCores (data-parallel).

  • For maximum training performance, please set environment variables XLA_USE_BF16=1 to enable full BF16 and Stochastic Rounding.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n