Previous Releases Notes (Neuron 2.x)
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
Previous Releases Notes (Neuron 2.x)#
Table of contents
Neuron 2.17.0 (02/13/2024)#
What’s New#
Neuron 2.17 release improves small collective communication operators (smaller than 16MB) by up to 30%, which improves large language model (LLM) Inference performance by up to 10%. This release also includes improvements in Neuron Profiler and other minor enhancements and bug fixes.
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
Neuron 2.16.1 (01/18/2024)#
Patch release with compiler bug fixes, updates to Neuron Device Plugin and Neuron Kubernetes Scheduler .
Neuron 2.16.0 (12/21/2023)#
Table of contents
What’s New#
Neuron 2.16 adds support for Llama-2-70B training and inference, upgrades to PyTorch 2.1 (beta) and adds new support for PyTorch Lightning Trainer (beta) as well as performance improvements and adding Amazon Linux 2023 support.
Training highlights: NeuronX Distributed library LLM models training performance is improved by up to 15%. LLM model training user experience is improved by introducing support of PyTorch Lightning Trainer (beta), and a new model optimizer wrapper which will minimize the amount of changes needed to partition models using NeuronX Distributed primitives.
Inference highlights: PyTorch inference now allows to dynamically swap different fine-tuned weights for an already loaded model, as well as overall improvements of LLM inference throughput and latency with Transformers NeuronX. Two new reference model samples for LLama-2-70b and Mistral-7b model inference.
User experience: This release introduces two new capabilities: A new tool, Neuron Distributed Event Tracing (NDET) which improves debuggability, and the support of profiling collective communication operators in the Neuron Profiler tool.
More release content can be found in the table below and each component release notes.
What’s New |
Details |
Instances |
---|---|---|
Transformers NeuronX (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Training |
|
Trn1/Trn1n |
NeuronX Distributed (neuronx-distributed) for Inference |
|
Inf2,Trn1/Trn1n |
PyTorch NeuronX (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
Neuron Tools |
|
Inf1/Inf2/Trn1/Trn1n |
Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
2.16.0 Known Issues and Limitations#
We recommend running multi-node training jobs on AL2023 using Amazon EKS. Parallel Cluster currently does not support AL2023.
There are known compiler issues impacting inference accuracy of certain model configurations of
Llama-2-13b
whenamp = fp16
is used. If this issue is observed,amp=fp32
should be used as a work around. This issue will be addressed in future Neuron releases.Execution time reported in
neuron-profile
tool is sometimes in-accurate due to a bug in how the time is captured. The bug will be addressed in upcoming Neuron releases.See component release notes below for any additional known issues.
Neuron 2.15.2 (11/17/2023)#
Patch release that fixes compiler issues related to performance when training using neuronx-nemo-megatron
library.
Neuron 2.15.1 (11/09/2023)#
Patch release to fix execution overhead issues in Neuron Runtime that were inadvertently introduced in 2.15 release.
Neuron 2.15.0 (10/26/2023)#
Table of contents
What’s New#
This release adds support for PyTorch 2.0 (Beta), increases performance for both training and inference workloads, adding ability to train models like Llama-2-70B
using neuronx-distributed
. With this release, we are also adding pipeline parallelism support for neuronx-distributed
enabling full 3D parallelism support to easily scale training to large model sizes.
Neuron 2.15 also introduces support for training resnet50
, milesial/Pytorch-UNet
and deepmind/vision-perceiver-conv
models using torch-neuronx
, as well as new sample code for flan-t5-xl
model inference using neuronx-distributed
, in addition to other performance optimizations, minor enhancements and bug fixes.
What’s New |
Details |
Instances |
---|---|---|
Neuron Distributed (neuronx-distributed) for Training |
|
Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference |
|
Inf2,Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
AWS Neuron Reference for Nemo Megatron library ( |
|
Trn1/Trn1n |
Neuron Compiler (neuronx-cc) |
|
Inf2/Trn1/Trn1n |
Neuron Tools |
|
Inf1/Inf2/Trn1/Trn1n |
Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
Neuron 2.14.1 (09/26/2023)#
This is a patch release that fixes compiler issues in certain configurations of Llama
and Llama-2
model inference using transformers-neuronx
.
Note
There is still a known compiler issue for inference of some configurations of Llama
and Llama-2
models that will be addressed in future Neuron release.
Customers are advised to use --optlevel 1 (or -O1)
compiler flag to mitigate this known compiler issue.
See Neuron Compiler CLI Reference Guide (neuronx-cc) on the usage of --optlevel 1
compiler flag. Please see more on the compiler fix and known issues in Neuron Compiler (neuronx-cc) release notes and Transformers Neuron (transformers-neuronx) release notes
Neuron 2.14.0 (09/15/2023)#
Table of contents
What’s New#
This release introduces support for Llama-2-7B
model training and T5-3B
model inference using neuronx-distributed
. It also adds support for Llama-2-13B
model training using neuronx-nemo-megatron
. Neuron 2.14 also adds support for Stable Diffusion XL(Refiner and Base)
model inference using torch-neuronx
. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.
This release introduces the following:
Note
This release deprecates --model-type=transformer-inference
compiler flag. Users are highly encouraged to migrate to the --model-type=transformer
compiler flag.
What’s New |
Details |
Instances |
---|---|---|
AWS Neuron Reference for Nemo Megatron library ( |
|
Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Training |
|
Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference |
|
Inf2,Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
Neuron Compiler (neuronx-cc) |
|
Inf2/Trn1/Trn1n |
Neuron Tools |
|
Inf1/Inf2/Trn1/Trn1n |
Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
Neuron 2.13.2 (09/01/2023)#
This is a patch release that fixes issues in Kubernetes (K8) deployments related to Neuron Device Plugin crashes and other pod scheduling issues. This release also adds support for zero-based Neuron Device indexing in K8 deployments, see the Neuron K8 release notes for more details on the specific bug fixes.
Updating to latest Neuron Kubernetes components and Neuron Driver is highly encouraged for customers using Kubernetes.
Please follow these instructions in setup guide to upgrade to latest Neuron release.
Neuron 2.13.1 (08/29/2023)#
This release adds support for Llama 2
model training (tutorial) using neuronx-nemo-megatron library, and adds support for Llama 2
model inference using transformers-neuronx
library (tutorial) .
Please follow these instructions in setup guide to upgrade to latest Neuron release.
Note
Please install transformers-neuronx
from https://pip.repos.neuron.amazonaws.com to get latest features and improvements.
This release does not support LLama 2 model with Grouped-Query Attention
Neuron 2.13.0 (08/28/2023)#
Table of contents
What’s New#
This release introduces support for GPT-NeoX
20B model training in neuronx-distributed
including Zero-1 optimizer capability. It also adds support for Stable Diffusion XL
and CLIP
models inference in torch-neuronx
. Neuron 2.13 also introduces AWS Neuron Reference for Nemo Megatron library supporting distributed training of LLMs like GPT-3 175B
. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.
This release introduces the following:
What’s New |
Details |
Instances |
---|---|---|
AWS Neuron Reference for Nemo Megatron library |
|
Trn1/Trn1n |
Transformers Neuron (transformers-neuronx) for Inference |
|
Inf2, Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Training |
|
Trn1/Trn1n |
Neuron Distributed (neuronx-distributed) for Inference |
|
Inf2,Trn1/Trn1n |
PyTorch Neuron (torch-neuronx) |
|
Trn1/Trn1n,Inf2 |
Neuron Tools |
|
Inf1/Inf2/Trn1/Trn1n |
Neuron Runtime |
|
Inf1, Inf2, Trn1/Trn1n |
End of Support Announcements and Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
2.13.0 Known Issues and Limitations#
Currently we see a NaN generated when the model implementation uses torch.dtype(float32.min) or torch.dtype(float32.max) along with XLA_USE_BF16/XLA_DOWNCAST_BF16. This is because, float32.min or float32.max gets downcasted to Inf in bf16 thereby producing a NaN. Short term fix is that we can use a small/large fp32 number instead of using float32.min/float32.max. Example, for mask creation, we can use -/+1e4 instead of min/max values. The issue will be addressed in future Neuron releases.
Neuron 2.12.2 (08/19/2023)#
Patch release to fix a jemalloc conflict for all Neuron customers that use Ubuntu 22. The previous releases shipped with a dependency on jemalloc that may lead to compilation failures in Ubuntu 22 only. Please follow these instructions in setup guide to upgrade to latest Neuron release.
Neuron 2.12.1 (08/09/2023)#
Patch release to improve reliability of Neuron Runtime when running applications on memory constrained instances. The Neuron Runtime has reduced the contiguous memory requirement for initializing the Neuron Cores associated with applications. This reduction allows bringup when only small amounts of contiguous memory remain on an instance. Please upgrade to latest Neuron release to use the latest Neuron Runtime.
Neuron 2.12.0 (07/19/2023)#
Table of contents
What’s New#
This release introduces ZeRO-1 optimizer for model training in torch-neuronx
, introduces beta support for GPT-NeoX
, BLOOM
, Llama
and Llama 2(coming soon)
models in transformers-neuronx
. This release also adds support for model inference serving on Triton Inference Server for Inf2 & Trn1 instances, lazy_load
API and async_load
API for model loading in torch-neuronx
, as well as other new features,
performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New |
Details |
Instances |
---|---|---|
ZeRO-1 optimizer for model training in |
|
Inf2, Trn1/Trn1n |
Support for new models and Enhancements in |
|
Inf2, Trn1/Trn1n |
Support for Inf2 and Trn1 instances on Triton Inference Server |
|
Inf2, Trn1 |
Support for new computer vision models |
|
Inf2, Trn1/Trn1n |
New Features in |
|
Trn1/Trn1n |
|
|
Inf2, Trn1/Trn1n |
[Beta] Asynchronous Execution support and Enhancements in Neuron Runtime |
|
Inf1, Inf2, Trn1/Trn1n |
Support for |
|
Inf2, Trn1/Trn1n |
New Micro Benchmarking Performance User Guide and Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Known Issues and Limitations |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
2.12.0 Known Issues and Limitations#
Known Issues in Ubuntu 22 Support#
Several Vision and NLP models on Ubuntu 22 are not supported due to Compilation issues. Issues will be addressed in upcoming releases.
CustomOp feature failing with seg fault on Ubuntu 22. Issue will be addressed in upcoming releases.
Known issues in certain resnet models on Ubuntu 20#
Known issue with support for resnet-18, resnet-34, resnet-50, resnet-101 and resnet-152 models on Ubuntu 20. Issues will be addressed in upcoming releases.
Neuron 2.11.0 (06/14/2023)#
Table of contents
What’s New#
This release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New |
Details |
Instances |
---|---|---|
New Features and Performance Enhancements in |
|
Inf2, Trn1/Trn1n |
Neuron Profiler Tool |
|
Inf1, Inf2, Trn1/Trn1n |
Neuron Compilation Cache through S3 |
|
Inf2, Trn1/Trn1n |
New script to scan a model for supported/unsupported operators |
|
Inf2, Trn1/Trn1n |
Neuron Distributed Library [Beta] |
|
Inf2, Trn1/Trn1n |
Neuron Calculator and Documentation Updates |
|
Inf1, Inf2, Trn1/Trn1n |
Enhancements to Neuron SysFS |
|
Inf1, Inf2, Trn1/Trn1n |
Support for Ubuntu 22 |
|
Inf1, Inf2, Trn1/Trn1n |
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
Neuron 2.10.0 (05/01/2023)#
Table of contents
What’s New#
This release introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New |
Details |
Instances |
---|---|---|
Initial support for computer vision models inference |
|
Inf2, Trn1/Trn1n |
Profiling support in PyTorch Neuron( |
Inf2, Trn1/Trn1n |
|
New Features and Performance Enhancements in transformers-neuronx |
|
Inf2, Trn1/Trn1n |
Support models larger than 2GB in TensorFlow 2.x Neuron ( |
|
Trn1/Trn1n, Inf2 |
Support models larger than 2GB in TensorFlow 2.x Neuron ( |
|
Inf1 |
Performance Enhancements in PyTorch C++ Custom Operators (Beta) |
|
Trn1/Trn1n |
Weight Deduplication Feature (Inf1) |
|
Inf1 |
|
|
Trn1/Trn1n , Inf2 |
Announcing end of support for tensorflow-neuron 2.7 & mxnet-neuron 1.5 versions |
Inf1 |
|
Minor enhancements and bug fixes. |
Trn1/Trn1n , Inf2, Inf1 |
|
Release Artifacts |
Trn1/Trn1n , Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
Neuron 2.9.1 (04/19/2023)#
Minor patch release to add support for deserialized torchscript model compilation and support for multi-node training in EKS. Fixes included in this release are critical to enable training and deploying models with Amazon Sagemaker or Amazon EKS.
Neuron 2.9.0 (03/28/2023)#
Table of contents
What’s New#
This release adds support for EC2 Trn1n instances, introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:
What’s New |
Details |
Instances |
---|---|---|
Support for EC2 Trn1n instances |
|
Trn1n |
New Analyze API in PyTorch Neuron ( |
|
Trn1, Inf2 |
Support models that are larger than 2GB in PyTorch Neuron ( |
|
Inf1 |
Performance Improvements |
|
Trn1 |
Dynamic Batching support in TensorFlow 2.x Neuron ( |
|
Trn1, Inf2 |
NeuronPerf support for Trn1/Inf2 instances |
|
Trn1, Inf2 |
Hierarchical All-Reduce and Reduce-Scatter collective communication |
|
Trn1, Inf2 |
New Tutorials added |
|
Trn1, Inf2 |
Minor enhancements and bug fixes. |
Trn1, Inf2, Inf1 |
|
Release included packages |
|
Trn1, Inf2, Inf1 |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
To learn about the model architectures currently supported on Inf1, Inf2, Trn1 and Trn1n instances, please see model_architecture_fit.
Neuron 2.8.0 (02/24/2023)#
Table of contents
What’s New#
This release adds support for EC2 Inf2 instances, introduces initial inference support with TensorFlow 2.x Neuron (tensorflow-neuronx
) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.
This release introduces the following:
What’s New |
Details |
---|---|
Support for EC2 Inf2 instances |
|
TensorFlow 2.x Neuron ( |
|
New Neuron GitHub samples |
|
Minor enhancements and bug fixes. |
|
Release included packages |
|
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
Neuron 2.7.0 (02/08/2023)#
Table of contents
What’s New#
This release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:
What’s New |
Details |
---|---|
PyTorch 1.13 |
Support of PyTorch 1.13 version for PyTorch Neuron ( |
PyTorch DistributedDataParallel (DDP) API |
Support of PyTorch DistributedDataParallel (DDP) API in PyTorch Neuron ( |
Inference support in |
For more details please visit pytorch-neuronx-main` page. You can also try Neuron Inference samples https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx in the |
Neuron Custom C++ Operators[Beta] |
Initial support for Neuron Custom C++ Operators [Beta] , with Neuron Custom C++ Operators (“CustomOps”) you can now write CustomOps that run on NeuronCore-v2 chips. For more resources please check Neuron Custom C++ Operators [Beta] section. |
|
|
Neuron sysfs filesystem |
Neuron sysfs filesystem exposes Neuron Devices under |
TFLOPS support in Neuron System Tools |
Neuron System Tools now also report model actual TFLOPs rate in both |
New sample scripts for training |
This release adds multiple new sample scripts for training models with |
New sample scripts for inference |
This release adds multiple new sample scripts for deploying models with |
Neuron GitHub samples repository for Amazon EKS |
A new AWS Neuron GitHub samples repository for Amazon EKS, Please check aws-neuron-samples repository |
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
Neuron 2.6.0 (12/12/2022)#
This release introduces the support of PyTorch 1.12 version, and introduces PyTorch Neuron (torch-neuronx
) profiling through Neuron Plugin for TensorBoard. Pytorch Neuron (torch-neuronx
) users can now profile their models through the following TensorBoard views:
Operator Framework View
Operator HLO View
Operator Trace View
This release introduces the support of LAMB optimizer for FP32 mode, and adds support for capturing snapshots of inputs, outputs and graph HLO for debugging.
In addition, this release introduces the support of new operators and resolves issues that improve stability for Trn1 customers.
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
Neuron 2.5.0 (11/23/2022)#
Neuron 2.5.0 is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.
Component |
New in this release |
---|---|
PyTorch Neuron |
|
TensorFlow Neuron |
|
This Neuron release is the last release that will include torch-neuron
versions 1.7 and 1.8, and that will include tensorflow-neuron
versions 2.5 and 2.6.
In addition, this release introduces changes to the Neuron packaging and installation instructions for Inf1 customers, see Introducing Neuron packaging and installation changes for Inf1 customers for more information.
For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.
Neuron 2.4.0 (10/27/2022)#
This release introduces new features and resolves issues that improve stability. The release introduces “memory utilization breakdown” feature in both Neuron Monitor and Neuron Top system tools. The release introduces support for “NeuronCore Based Sheduling” capability to the Neuron Kubernetes Scheduler and introduces new operators support in Neuron Compiler and PyTorch Neuron. This release introduces also additional eight (8) samples of models’ fine tuning using PyTorch Neuron. The new samples can be found in the AWS Neuron Samples GitHub repository.
Neuron 2.3.0 (10/10/2022)#
This Neuron 2.3.0 release extends Neuron 1.x and adds support for the new AWS Trainium powered Amazon EC2 Trn1 instances. With this release, you can now run deep learning training workloads on Trn1 instances to save training costs by up to 50% over equivalent GPU-based EC2 instances, while getting the highest training performance in AWS cloud for popular NLP models.
What’s New |
|
Tested workloads and known issues |
|
Neural-networks training support
Supported instances: Trn1
Supported Frameworks: PyTorch Neuron (torch-neuronx)
Supported Data-types
FP32, BF16
Supported Rounding Modes
Stochastic Rounding (SR)
Round Nearest ties to Even (RNE)
Supported Automatic Casting Methods
Neuron automatic casting of FP32 tensors / weights / operations to BF16 - Default mode
PyTorch automatic casting
Full BF16 automatic casting (via XLA_USE_BF16=1 environment variable)
PyTorch Neuron (torch-neuronx
)
PyTorch 1.11
Supported instances: Trn1
Supported Python versions: Python 3.7, Python 3.8
Eager Debug Mode
Persistent Cache for compilation
Collective compute operations: AllReduce
Optimizers: AdamW, SGD
Tested loss functions: Negative log-likelihood (NLL), Cross-entropy
Training Libraries/Frameworks
torch.distributed
Megatron-LM Reference for Neuron
Training Examples
For More information:
Neuron Runtime, Drivers and Networking Components
Neuron Runtime 2.9
Supported instances: Trn1, Inf1
Elastic Fabric Adapter (EFA) @ 800Gbps
Collective communication operators
AllReduce
AllGather
ReduceScatter
Release Notes:
Neuron Tools
Neuron system tools - Adding Trn1 support to the following tools:
neuron-monitor
neuron-top
neuron-ls
Release Notes:
Developer Flows
Containers
Deep Learning Containers (DLC) supporting PyTorch Neuron (
torch-neuronx
)
Multi-Instance distributed workloads orchestration:
AWS ParallelCluster (Through custom AMI build)
Amazon Elastic Compute Cloud (ECS)
Supported Amazon Linux Images (AMIs)
Ubuntu 20 Neuron DLAMI-base (Python 3.8)
Amazon Linux2 Neuron DLAMI-base (Python 3.7)
Ubuntu 18 Neuron DLAMI-base (Python 3.7)
Ubuntu 18 AMI (Python 3.7)
Ubuntu 20 AMI (Python 3.8)
Amazon Linux2 AMI (Python 3.7)
The following workloads were tested in this release:
Distributed data-parallel pre-training of Hugging Face BERT model on single Trn1.32xl instance (32 NeuronCores).
Distributed data-parallel pre-training of Hugging Face BERT model on multiple Trn1.32xl instances.
HuggingFace BERT MRPC task finetuning on single NeuronCore or multiple NeuronCores (data-parallel).
Megatron-LM GPT3 (6.7B parameters) pre-training on single Trn1.32xl instance.
Megatron-LM GPT3 (6.7B parameters) pre-training on multi Trn1.32xl instances.
Multi-Layer Perceptron (ML) model training on single NeuronCore or multiple NeuronCores (data-parallel).
For maximum training performance, please set environment variables
XLA_USE_BF16=1
to enable full BF16 and Stochastic Rounding.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n