This document is relevant for: Inf2, Trn1, Trn2

Developer guide for Neuronx-Distributed Inference#

Neuronx-Distributed (NxD Core) provides fundamental building blocks that enable you to run advanced inference workloads on AWS Inferentia and Trainium instances. These building blocks include parallel linear layers that enable distributed inference, a model builder that compiles PyTorch modules into Neuron models, and more.

Neuron also offers Neuronx-Distributed (NxD) Inference, which is a library that provides optimized model and module implementations that build on top of NxD Core. We recommend that you use NxD Inference to run inference workloads and onboard custom models. For more information about NxD Inference, see NxD Inference Overview.

For examples of how to build directly on NxD Core, see the following:

Llama 3.2 1B inference sample
T5 3B inference tutorial [html] [notebook]

This document is relevant for: Inf2, Trn1, Trn2