Neuron Distributed Release Notes (neuronx-distributed)
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
Neuron Distributed Release Notes (neuronx-distributed
)#
Table of contents
This document lists the release notes for Neuronx-Distributed library.
Neuron Distributed [0.7.0]#
Date: 04/01/2024
New in this release#
Added support for Pipeline-parallelism training using PyTorch-lightning
Added support for fine-tuning a model and running evaluation on the fine-tuned model using optimum-neuron
Added support for auto-partitioning the pipeline parallel stages for training large models
Added support for async checkpointing, optimizing the checkpoint saving time.
Added support for auto-resume from a checkpoint, in case training job crashes.
Added support for sequence length autobucketing in inference
Added support for inference with bfloat16
Improved performance for Llama-2-7b inference example.
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
Neuron Distributed [0.6.0]#
Date: 12/21/2023
New in this release#
Added support for Model/Optimizer wrapper that handles the parallelization in both model and optimizer.
Added support for PyTorch-lightning. This allows users to train models using Tensor-parallelism and Data-parallelism.
Added new checkpoint save/load APIs that handles the parallelization and dumps/loads the checkpoint.
Added a new QKV module which has the ability to replicate the KV heads and produce the query, key and value states.
Reduced the model initialization time when pipeline-parallel distributed strategy is used.
Added support for limiting max parallel compilations in parallel_model_trace. This resolves many out of memory errors by reducing the host memory usage.
Added example for Llama-2-7b inference. This is still early in development and is not well-optimized. The current recommendation is to use transformers-neuronx for optimal performance of llama inference.
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
Pipeline-parallelism is not supported as part of PyTorch-lightning integration.
Neuron Distributed [0.5.0]#
Date: 10/26/2023
New in this release#
Added support for pipeline-parallelism for distributed training.
Added support for serialized checkpoint saving/loading, resulting in better checkpoint saving/loading time.
Added support for mixed precision training using torch.autocast.
Fixed an issue with Zero1 checkpoint saving/loading.
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
Neuron Distributed [0.4.0]#
Date: 9/15/2023
New in this release#
Added API for padding attention heads when they are not divisible by tensor-parallel degree
Added a constant threadpool for distributed inference
Fixed a bug with padding_idx in ParallelEmbedding layer
Fixed an issue with checkpoint loading to take into account the stride parameter in tensor parallel layers
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
Neuron Distributed [0.3.0]#
Date: 8/28/2023
New in this release#
Added Zero1 Optimizer support that works with tensor-parallelism
Added support for sequence-parallel that works with tensor-parallelism
Added IO aliasing feature in parallel_trace api, which can allow marking certains tensors as state tensors
Fixed hangs when tracing models using parallel_trace for higher TP degree
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
Neuron Distributed [0.2.0]#
Date: 7/19/2023
New in this release#
Added parallel cross entropy loss function.
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
Date: 6/14/2023
New in this release#
Releasing the Neuron Distributed (
neuronx-distributed
) library for enabling large language model training/inference.Added support for tensor-parallelism training/inference.
Known Issues and Limitations#
Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n