Neuron Inference Model Support#

This section provides information on model support in NeuronX Distributed Inference (NxDI) and how to determine appropriate configurations for both online and offline use cases.

Llama 3#

Meta’s Llama 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:

Llama 3.3 70B

Meta’s multilingual LLM, featuring 70B parameters and Grouped Query Attention.

Quickstart

Note

Instructions for additional models will be available soon. For a complete list of supported model architectures, refer to this developer guide.