Neuron Inference Model Support#

This section provides information on model support in NeuronX Distributed Inference (NxDI) and how to determine appropriate configurations for both online and offline use cases.

Llama 3#

Meta’s Llama 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:

Llama 3.3 70B

Meta’s multilingual LLM, featuring 70B parameters and Grouped Query Attention.

Quickstart

Qwen 3#

Qwen 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:

Qwen3 MoE 235B

Qwen family multilingual LLM, featuring sparse Mixture-of-Experts and Grouped Query Attention

Quickstart

Note

Instructions for additional models will be available soon. For a complete list of supported model architectures, refer to this developer guide.