.. _neuronx-distributed-rn:


Neuron Distributed Release Notes (``neuronx-distributed``)
==========================================================

.. contents:: Table of contents
   :local:
   :depth: 1

This document lists the release notes for Neuronx-Distributed library.

Neuron Distributed [0.7.0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Date: 04/01/2024

New in this release
-------------------

* Added support for Pipeline-parallelism training using PyTorch-lightning
* Added support for fine-tuning a model and running evaluation on the fine-tuned model using optimum-neuron
* Added support for auto-partitioning the pipeline parallel stages for training large models
* Added support for async checkpointing, optimizing the checkpoint saving time.
* Added support for auto-resume from a checkpoint, in case training job crashes.
* Added support for sequence length autobucketing in inference
* Added support for inference with bfloat16
* Improved performance for Llama-2-7b inference example.

Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.6.0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Date: 12/21/2023

New in this release
-------------------

* Added support for Model/Optimizer wrapper that handles the parallelization in both model and optimizer.
* Added support for PyTorch-lightning. This allows users to train models using Tensor-parallelism and Data-parallelism.
* Added new checkpoint save/load APIs that handles the parallelization and dumps/loads the checkpoint.
* Added a new QKV module which has the ability to replicate the KV heads and produce the query, key and value states.
* Reduced the model initialization time when pipeline-parallel distributed strategy is used.
* Added support for limiting max parallel compilations in parallel_model_trace. This resolves many out of memory errors by reducing the host memory usage.
* Added example for Llama-2-7b inference. This is still early in development and is not well-optimized. The current recommendation is to use `transformers-neuronx` for optimal performance of llama inference.

Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.
* Pipeline-parallelism is not supported as part of PyTorch-lightning integration.

Neuron Distributed [0.5.0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Date: 10/26/2023

New in this release
-------------------

* Added support for pipeline-parallelism for distributed training.
* Added support for serialized checkpoint saving/loading, resulting in better checkpoint saving/loading time.
* Added support for mixed precision training using `torch.autocast`.
* Fixed an issue with Zero1 checkpoint saving/loading.


Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.4.0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Date: 9/15/2023

New in this release
-------------------

* Added API for padding attention heads when they are not divisible by tensor-parallel degree
* Added a constant threadpool for distributed inference
* Fixed a bug with padding_idx in ParallelEmbedding layer
* Fixed an issue with checkpoint loading to take into account the stride parameter in tensor parallel layers

Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.3.0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Date: 8/28/2023

New in this release
-------------------

* Added Zero1 Optimizer support that works with tensor-parallelism
* Added support for sequence-parallel that works with tensor-parallelism
* Added IO aliasing feature in parallel_trace api, which can allow marking certains tensors as state tensors
* Fixed hangs when tracing models using parallel_trace for higher TP degree

Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Neuron Distributed [0.2.0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Date: 7/19/2023

New in this release
-------------------

* Added parallel cross entropy loss function.

Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.

Date: 6/14/2023

New in this release
-------------------

* Releasing the Neuron Distributed (``neuronx-distributed``) library for enabling large language model training/inference.
* Added support for tensor-parallelism training/inference.

Known Issues and Limitations
----------------------------

* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.