Neuron Inference Model Support#
This section provides information on model support in NeuronX Distributed Inference (NxDI) and how to determine appropriate configurations for both online and offline use cases.
Llama 3#
Meta’s Llama 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:
Meta’s multilingual LLM, featuring 70B parameters and Grouped Query Attention.
Qwen 3#
Qwen 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:
Qwen family multilingual LLM, featuring sparse Mixture-of-Experts and Grouped Query Attention
Note
Instructions for additional models will be available soon. For a complete list of supported model architectures, refer to this developer guide.