This document is relevant for: Inf1, Inf2, Trn1, Trn2

NxD Inference Release Notes (`neuronx-distributed-inference`)#

This document lists the release notes for Neuronx Distributed Inference library.

Neuronx Distributed Inference [0.1.0] (Beta)#

Date: 12/03/2024

Features in this release#

NeuronX Distributed (NxD) Inference (neuronx-distributed-inference) is an open-source PyTorch-based inference library that simplifies deep learning model deployment on AWS Inferentia and Trainium instances. Neuronx Distributed Inference includes a model hub and modules that users can reference to implement their own models on Neuron.

This is the first release of NxD Inference (Beta) that includes:

Support for Trn2 instances
Compatibility with HuggingFace checkpoints and generate() API
vLLM integration
Model compilation and serialization
Tensor parallelism
Speculative decoding
Quantization
Dynamic sampling
Llama3.1 405B Inference Example on Trn2
Open Source Github repository: aws-neuron/neuronx-distributed-inference

For more information about the features supported by NxDI, see NxD Inference Features Configuration Guide.

This document is relevant for: Inf1, Inf2, Trn1, Trn2

NxD Inference Release Notes (neuronx-distributed-inference)

Contents

NxD Inference Release Notes (`neuronx-distributed-inference`)#

Neuronx Distributed Inference [0.1.0] (Beta)#

Features in this release#

NxD Inference Release Notes (neuronx-distributed-inference)

Contents

NxD Inference Release Notes (neuronx-distributed-inference)#

Neuronx Distributed Inference [0.1.0] (Beta)#

Features in this release#

NxD Inference Release Notes (`neuronx-distributed-inference`)#