Get started with Inference#
This guide provides instructions on how to run inference on Neuron devices with NeuronX Distributed Inference (NxDI) and how to determine appropriate configurations for both online and offline use cases.
Llama 3#
Meta’s Llama 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:
Llama 3.3 70B
Meta’s multilingual LLM, featuring 70B parameters and Grouped Query Attention.
Note
Instructions for additional models will be available soon. For a complete list of supported model architectures, refer to this developer guide.