This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
NxD Inference Release Notes (neuronx-distributed-inference
)#
This document lists the release notes for Neuronx Distributed Inference library.
Neuronx Distributed Inference [0.1.0] (Beta)#
Date: 12/03/2024
Features in this release#
NeuronX Distributed (NxD) Inference (neuronx-distributed-inference
) is
an open-source PyTorch-based inference library that simplifies deep learning
model deployment on AWS Inferentia and Trainium instances. Neuronx Distributed
Inference includes a model hub and modules that users can reference to
implement their own models on Neuron.
This is the first release of NxD Inference (Beta) that includes:
Support for Trn2 instances
Compatibility with HuggingFace checkpoints and
generate()
APIvLLM integration
Model compilation and serialization
Tensor parallelism
Speculative decoding
Quantization
Dynamic sampling
Llama3.1 405B Inference Example on Trn2
Open Source Github repository: aws-neuron/neuronx-distributed-inference
For more information about the features supported by NxDI, see NxD Inference Features Configuration Guide.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2