.. _nxdi-model-reference:

NxD Inference - Production Ready Models
=======================================

Neuronx Distributed Inference provides production ready models that you can
directly use for seamless deployment. You can view the source code for all
supported models in the `NxD Inference GitHub repository <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/src/neuronx_distributed_inference/models>`__. 

.. note:: 
   
   If you are looking to deploy a custom model integration, you can follow the
   :ref:`model onboarding guide <nxdi-onboarding-models>`. You can refer to the source
   code for supported models in the `NxD Inference GitHub repository <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/src/neuronx_distributed_inference/models>`__
   and make custom changes required for your use case.

.. contents:: Table of contents
   :local:
   :depth: 2

Using Models to Run Inference
-----------------------------

You can run models through vLLM or integrate directly with NxD
Inference.

Using vLLM
~~~~~~~~~~

If you are using vLLM for production deployment, we recommend that you
use the vLLM API to integrate with NxD Inference. The vLLM API automatically
chooses the correct model and config classes based on the model's config file.
For more information, refer to the :ref:`nxdi-vllm-user-guide-v1`.

Integrating Directly with NxD Inference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To use NxD Inference directly, you construct model and configuration
classes. For more information about which model and configuration classes to use for each
model, see :ref:`nxdi-supported-model-architectures`. To see an example of how to
run inference directly with NxD Inference, see the `generation_demo.py
script <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/examples/generation_demo.py>`__.

.. _nxdi-supported-model-architectures:

Supported Model Architectures
-----------------------------

NxD Inference currently provides support for the following model
architectures.

Llama (Text)
~~~~~~~~~~~~

NxD Inference supports Llama text models. The Llama model architecture
supports all Llama text models, including Llama 2, Llama 3, Llama 3.1,
Llama 3.2, and Llama 3.3. You can also use the Llama model architecture
to run any model based on Llama, such as Mistral.

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: NeuronConfig
- Inference config class: LlamaInferenceConfig
- Causal LM model class: NeuronLlamaForCausalLM

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct (requires
  Trn2)
- https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
- https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
- https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
- https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

----

Llama 4
~~~~~~~~

NxD Inference supports Llama 4 models, including both Scout and Maverick checkpoints.
You can use Hugging Face checkpoints. Both checkpoints leverage early fusion for native multimodality,
enabling them to process text and image inputs. For more information
about how to run Llama 4 inference, see :ref:`/libraries/nxd-inference/tutorials/llama4-tutorial.ipynb`.

.. _neuron-classes-1:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: Llama4NeuronConfig
- Inference config class: Llama4InferenceConfig
- Causal LM model class: NeuronLlama4ForCausalLM

.. _compatible-checkpoint-examples-1:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
- https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct

----

Mixtral
~~~~~~~

NxD Inference supports models based on the Mixtral model architecture,
which uses mixture-of-experts (MoE) architecture.

.. _neuron-classes-2:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: MoENeuronConfig
- Inference config class: MixtralInferenceConfig
- Causal LM model class: NeuronMixtralForCausalLM

.. _compatible-checkpoint-examples-2:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

----

DBRX
~~~~

NxD Inference supports models based on the DBRX model architecture,
which uses mixture-of-experts (MoE) architecture.

.. _neuron-classes-3:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: DbrxNeuronConfig
- Inference config class: DbrxInferenceConfig
- Causal LM model class: NeuronDbrxForCausalLM

.. _compatible-checkpoint-examples-3:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/databricks/dbrx-instruct

Qwen2.5
~~~~~~~~

NxD Inference supports models based on the Qwen2.5 model architecture.

----

.. _neuron-classes-4:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: Qwen2NeuronConfig
- Inference config class: Qwen2InferenceConfig
- Causal LM model class: NeuronQwen2ForCausalLM

.. _compatible-checkpoint-examples-4:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/Qwen/Qwen2.5-72B-Instruct
- https://huggingface.co/Qwen/Qwen2.5-32B-Instruct
- https://huggingface.co/Qwen/Qwen2.5-14B-Instruct (Not tested, but expected to work out of the box)
- https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
- https://huggingface.co/Qwen/Qwen2.5-3B-Instruct (Not tested, but expected to work out of the box)
- https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct (Not tested, but expected to work out of the box)
- https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

----

Qwen3
~~~~~~

NxD Inference supports models based on the Qwen3 model architecture.

.. _neuron-classes-5:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: Qwen3NeuronConfig
- Inference config class: Qwen3InferenceConfig
- Causal LM model class: NeuronQwen3ForCausalLM

.. _compatible-checkpoint-examples-5:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/Qwen/Qwen3-0.6B
- https://huggingface.co/Qwen/Qwen3-1.7B
- https://huggingface.co/Qwen/Qwen3-4B
- https://huggingface.co/Qwen/Qwen3-8B
- https://huggingface.co/Qwen/Qwen3-14B
- https://huggingface.co/Qwen/Qwen3-32B

----

Qwen3 MoE
~~~~~~~~~~

NxD Inference supports Qwen3 MoE language model which supports multilingual text inputs.

.. _neuron-classes-6:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: MoENeuronConfig
- Inference config class: Qwen3MoeInferenceConfig
- Causal LM model class: NeuronQwen3MoeForCausalLM

.. _compatible-checkpoint-examples-6:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/Qwen/Qwen3-235B-A22B

----

FLUX.1 [BETA]
~~~~~~~~~~~~~~~~~~

NxD Inference supports FLUX.1-dev model checkpoint for text to image generation.
You can use Hugging Face checkpoints. For more information
about how to run FLUX.1-dev inference, see :ref:`/libraries/nxd-inference/tutorials/flux-inference-tutorial.ipynb`.

.. _neuron-classes-7:

Neuron Classes
^^^^^^^^^^^^^^

- Flux Application class: NeuronFluxApplication
- Flux Pipeline class: NeuronFluxPipeline
- Flux Backbone Neuron config class: FluxBackboneInferenceConfig

.. _compatible-checkpoint-examples-7:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/black-forest-labs/FLUX.1-dev

----

Pixtral-Large-Instruct-2411
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NxD Inference supports Pixtral image understanding model which processes text and image inputs. You can use HuggingFace checkpoint.

.. _neuron-classes-8:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: NeuronConfig
- Inference config class: PixtralInferenceConfig
- Causal LM model class: NeuronPixtralForCausalLM

.. _compatible-checkpoint-examples-8:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411

----

Qwen2-VL-7B-Instruct (Dense)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NxD Inference supports models based on the Qwen2-VL-7B-Instruct (Dense) model architecture.

.. _neuron-classes-9:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: NeuronConfig
- Inference config class: Qwen2VLInferenceConfig
- Causal LM model class: NeuronQwen2VLForCausalLM

.. _compatible-checkpoint-examples-9:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct

----

Qwen3-VL-8B-Thinking (Dense)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NxD Inference supports models based on the Qwen3-VL-8B-Thinking (Dense) model architecture.

.. _neuron-classes-10:

Neuron Classes
^^^^^^^^^^^^^^

- Neuron config class: Qwen3VLNeuronConfig
- Inference config class: Qwen3VLInferenceConfig
- Causal LM model class: NeuronQwen3VLForCausalLM

.. _compatible-checkpoint-examples-10:

Compatible Checkpoint Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking
