This document is relevant for: Inf1, Trn1

Previous Releases Notes (Neuron 2.x)#

Neuron 2.5.0 (11/23/2022)#

Neuron 2.5.0 is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.

Component

New in this release

PyTorch Neuron (torch-neuron)

TensorFlow Neuron (tensorflow-neuron)

  • tf-neuron-auto-multicore tool to enable automatic data parallel on multiple NeuronCores.

  • Experimental support for tracing models larger than 2GB using extract-weights flag (TF2.x only), see TensorFlow 2.x (tensorflow-neuron) Tracing API

  • tfn.auto_multicore Python API to enable automatic data parallel (TF2.x only)

This Neuron release is the last release that will include torch-neuron versions 1.7 and 1.8, and that will include tensorflow-neuron versions 2.5 and 2.6.

In addition, this release introduces changes to the Neuron packaging and installation instructions for Inf1 customers, see Introducing Neuron packaging and installation changes for Inf1 customers for more information.

For more detailed release notes of the new features and resolved issues, see Neuron Components Release Notes.

Neuron 2.4.0 (10/27/2022)#

This release introduces new features and resolves issues that improve stability. The release introduces “memory utilization breakdown” feature in both Neuron Monitor and Neuron Top system tools. The release introduces support for “NeuronCore Based Sheduling” capability to the Neuron Kubernetes Scheduler and introduces new operators support in Neuron Compiler and PyTorch Neuron. This release introduces also additional eight (8) samples of models’ fine tuning using PyTorch Neuron. The new samples can be found in the AWS Neuron Samples GitHub repository.

Neuron 2.3.0 (10/10/2022)#

Overview#

This Neuron 2.3.0 release extends Neuron 1.x and adds support for the new AWS Trainium powered Amazon EC2 Trn1 instances. With this release, you can now run deep learning training workloads on Trn1 instances to save training costs by up to 50% over equivalent GPU-based EC2 instances, while getting the highest training performance in AWS cloud for popular NLP models.

What’s New

Tested workloads and known issues

New features and capabilities#

Neural-networks training support
PyTorch Neuron (torch-neuronx)
Neuron Runtime, Drivers and Networking Components
Neuron Tools
Developer Flows

Tested Workloads#

The following workloads were tested in this release:

  • Distributed data-parallel pre-training of Hugging Face BERT model on single Trn1.32xl instance (32 NeuronCores).

  • Distributed data-parallel pre-training of Hugging Face BERT model on multiple Trn1.32xl instances.

  • HuggingFace BERT MRPC task finetuning on single NeuronCore or multiple NeuronCores (data-parallel).

  • Megatron-LM GPT3 (6.7B parameters) pre-training on single Trn1.32xl instance.

  • Megatron-LM GPT3 (6.7B parameters) pre-training on multi Trn1.32xl instances.

  • Multi-Layer Perceptron (ML) model training on single NeuronCore or multiple NeuronCores (data-parallel).

Known Issues#

  • For maximum training performance, please set environment variables XLA_USE_BF16=1 to enable full BF16 and Stochastic Rounding.

This document is relevant for: Inf1, Trn1