This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Neuron Distributed Release Notes (neuronx-distributed)#

This document lists the release notes for Neuronx-Distributed library.

Neuron Distributed [0.7.0]#

Date: 04/01/2024

New in this release#

  • Added support for Pipeline-parallelism training using PyTorch-lightning

  • Added support for fine-tuning a model and running evaluation on the fine-tuned model using optimum-neuron

  • Added support for auto-partitioning the pipeline parallel stages for training large models

  • Added support for async checkpointing, optimizing the checkpoint saving time.

  • Added support for auto-resume from a checkpoint, in case training job crashes.

  • Added support for sequence length autobucketing in inference

  • Added support for inference with bfloat16

  • Improved performance for Llama-2-7b inference example.

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.6.0]#

Date: 12/21/2023

New in this release#

  • Added support for Model/Optimizer wrapper that handles the parallelization in both model and optimizer.

  • Added support for PyTorch-lightning. This allows users to train models using Tensor-parallelism and Data-parallelism.

  • Added new checkpoint save/load APIs that handles the parallelization and dumps/loads the checkpoint.

  • Added a new QKV module which has the ability to replicate the KV heads and produce the query, key and value states.

  • Reduced the model initialization time when pipeline-parallel distributed strategy is used.

  • Added support for limiting max parallel compilations in parallel_model_trace. This resolves many out of memory errors by reducing the host memory usage.

  • Added example for Llama-2-7b inference. This is still early in development and is not well-optimized. The current recommendation is to use transformers-neuronx for optimal performance of llama inference.

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

  • Pipeline-parallelism is not supported as part of PyTorch-lightning integration.

Neuron Distributed [0.5.0]#

Date: 10/26/2023

New in this release#

  • Added support for pipeline-parallelism for distributed training.

  • Added support for serialized checkpoint saving/loading, resulting in better checkpoint saving/loading time.

  • Added support for mixed precision training using torch.autocast.

  • Fixed an issue with Zero1 checkpoint saving/loading.

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.4.0]#

Date: 9/15/2023

New in this release#

  • Added API for padding attention heads when they are not divisible by tensor-parallel degree

  • Added a constant threadpool for distributed inference

  • Fixed a bug with padding_idx in ParallelEmbedding layer

  • Fixed an issue with checkpoint loading to take into account the stride parameter in tensor parallel layers

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.3.0]#

Date: 8/28/2023

New in this release#

  • Added Zero1 Optimizer support that works with tensor-parallelism

  • Added support for sequence-parallel that works with tensor-parallelism

  • Added IO aliasing feature in parallel_trace api, which can allow marking certains tensors as state tensors

  • Fixed hangs when tracing models using parallel_trace for higher TP degree

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.2.0]#

Date: 7/19/2023

New in this release#

  • Added parallel cross entropy loss function.

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Date: 6/14/2023

New in this release#

  • Releasing the Neuron Distributed (neuronx-distributed) library for enabling large language model training/inference.

  • Added support for tensor-parallelism training/inference.

Known Issues and Limitations#

  • Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n