.. _model_samples_inference_inf2_trn1:

Inference Samples/Tutorials (Inf2/Trn1/Trn2)
============================================

.. important::

   Some samples linked on this page have been archived and are provided for historical reference only. They are not tested with recent versions of the Neuron SDK. For the latest inference tutorials, refer to :ref:`NxD Inference Tutorials <nxdi-tutorials-index>`.

.. contents:: Table of contents
   :local:
   :depth: 1


.. _encoder_model_samples_inference_inf2_trn1:
 
Encoders 
--------


.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - bert-base-cased-finetuned-mrpc
     - torch-neuronx
     - * :ref:`BERT TorchServe tutorial <pytorch-tutorials-torchserve-neuronx>`
       * HuggingFace pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb>`
       * `LibTorch C++ Tutorial for HuggingFace Pretrained BERT <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorial-libtorch.html#pytorch-tutorials-libtorch>`_
       * `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/blob/master/inference/inf2-bert-on-sagemaker/inf2_bert_sagemaker.ipynb>`_


   * - bert-base-cased-finetuned-mrpc
     - neuronx-distributed
     - * :ref:`tp_inference_tutorial`


   * - bert-base-uncased
     - torch-neuronx
     - * `HuggingFace Pretrained BERT Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_bert_inference_on_trn1.ipynb>`_

   * - distilbert-base-uncased
     - torch-neuronx
     - * `HuggingFace Pretrained DistilBERT Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_distilbert_Inference_on_trn1.ipynb>`_


   * - roberta-base
     - tensorflow-neuronx
     - * HuggingFace Roberta-Base :ref:`[html]</src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>` :github:`[notebook] </src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>`


   * - roberta-large
     - torch-neuronx
     - * `HuggingFace Pretrained RoBERTa Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_roberta_inference_on_frn1.ipynb>`_


.. _decoder_model_samples_inference_inf2_trn1:

Decoders
--------

.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - gpt2
     - torch-neuronx
     - * `HuggingFace Pretrained GPT2 Feature Extraction on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_gpt2_feature_extraction_on_trn1.ipynb>`_
  
   * - meta-llama/Llama-3.3-70B
     - neuronx-distributed-inference
     - * :ref:`nxdi-trn2-llama3.3-70b-tutorial`
       * :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.ipynb`
       * :ref:`nxdi-sd-inference-tutorial`

   * - meta-llama/Llama-3.1-8b
     - transformers-neuronx
     - * `Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 32k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-32k-sampling.ipynb>`_
       * `Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 128k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-128k-sampling.ipynb>`_
       * `Run meta-llama/Meta-Llama-3.1-8B autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3.1-8b-sampling.ipynb>`_
   
   * - meta-llama/Llama-3.1-70b
     - transformers-neuronx
     - * `Run Hugging Face Llama 3.1 70B autoregressive sampling on Trn1 with 64k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-64k-sampling.ipynb>`_
       * `Run Hugging Face meta-llama/Meta-Llama-3.1-70B autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3.1-70b-sampling.ipynb>`_

   * - meta-llama/Llama-3.1-70b-Instruct
     - transformers-neuronx
     - * `Run Hugging Face Llama-3.1-70B-Instruct + Llama-3.2-1B-Instruct Speculative Decoding on Trn1 with transformers-neuronx and vLLM <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-speculative-decoding.ipynb>`_
       * `Run Hugging Face Llama-3.1-70B-Instruct EAGLE Speculative Decoding on Trn1 with transformers-neuronx and vLLM <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-eagle-speculative-decoding.ipynb>`_

   * - meta-llama/Llama-3.1-405b
     - neuronx-distributed-inference
     - * :ref:`Tutorial for deploying Llama-3.1-405B on Trn2 <nxdi-trn2-llama3.1-405b-tutorial>`
       * :ref:`nxdi-trn2-llama3.1-405b-speculative-tutorial`
   
   * - meta-llama/Llama-3.1-405b
     - transformers-neuronx
     - * `Run Hugging Face Llama 3.1 405B autoregressive sampling on Trn1/Trn1n with 16k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-405b-multinode-16k-sampling.ipynb>`_

   * - meta-llama/Llama-3-8b
     - transformers-neuronx
     - * `Run Hugging Face meta-llama/Llama-3-8b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3-8b-sampling.ipynb>`_

   * - meta-llama/Llama-3-70b
     - transformers-neuronx
     - * `Run Hugging Face meta-llama/Llama-3-70b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3-70b-sampling.ipynb>`_

   * - meta-llama/Llama-2-13b
     - transformers-neuronx
     - * `Run Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>`_

   * - meta-llama/Llama-2-70b
     - transformers-neuronx
     - * `Run Hugging Face meta-llama/Llama-2-70b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-70b-sampling.ipynb>`_
       *  `Run speculative sampling on Meta Llama models [Beta] <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/speculative_sampling.ipynb>`_

   * - meta-llama/Llama-3.2-1B-Instruct
     - neuronx-distributed
     - * `Run meta-llama/Llama-3.2-1B-Instruct on Inf2 and Trn1 <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`_

   * - meta-llama/codellama-13b
     - neuronx-distributed
     - * `Run meta-llama/codellama-13b-16k-sampling <https://github.com/aws-neuron/aws-neuron-samples/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`_

   * - mistralai/Mistral-7B-Instruct-v0.1
     - transformers-neuronx
     - * :ref:`Run Mistral-7B-Instruct-v0.1 autoregressive sampling on Inf2 & Trn1 <mistral_gqa_code_sample>`

   * - mistralai/Mistral-7B-Instruct-v0.2
     - transformers-neuronx
     - * `Run Hugging Face mistralai/Mistral-7B-Instruct-v0.2 autoregressive sampling on Inf2 & Trn1 [Beta] <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/mistralai-Mistral-7b-Instruct-v0.2.ipynb>`_

   * - Mixtral-8x7B-v0.1
     - transformers-neuronx
     - * `Run Hugging Face mistralai/Mixtral-8x7B-v0.1 autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/mixtral-8x7b-sampling.ipynb>`_

   * - Mixtral-8x7B
     - neuronx-distributed
     - * `Mixtral inference with NeuronX Distributed on Inf2 & Trn1 <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/mixtral>`_


   * - DBRX
     - neuronx-distributed
     - * `DBRX inference with NeuronX Distributed on Inf2 & Trn1 <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/dbrx>`_  

   * - codellama/CodeLlama-13b-hf
     - transformers-neuronx
     - * `Run Hugging Face codellama/CodeLlama-13b-hf autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`_

.. _encoder_decoder_model_samples_inference_inf2_trn1:

Encoder-Decoders  
----------------


.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - t5-large
     - * torch-neuronx
       * optimum-neuron
     - * T5 inference tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/t5-inference-tutorial.ipynb>`

   * - t5-3b
     - neuronx-distributed
     - * T5 inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`

   * - google/flan-t5-xl
     - neuronx-distributed
     - * flan-t5-xl inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`


.. _vision_transformer_model_samples_inference_inf2_trn1:

Vision Transformers  
-------------------

.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size
   
   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - google/vit-base-patch16-224
     - torch-neuronx
     - * `HuggingFace Pretrained ViT Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_vit_inference_on_inf2.ipynb>`_

   * - clip-vit-base-patch32
     - torch-neuronx
     - * `HuggingFace Pretrained CLIP Base Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_clip_base_inference_on_inf2.ipynb>`_


   * - clip-vit-large-patch14
     - torch-neuronx
     - * `HuggingFace Pretrained CLIP Large Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_clip_large_inference_on_inf2.ipynb>`_


.. _cnn_model_samples_inference_inf2_trn1:

Convolutional Neural Networks(CNN)
----------------------------------


.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - resnet50
     - torch-neuronx
     - * `Torchvision Pretrained ResNet50 Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/tv_pretrained_resnet50_inference_on_trn1.ipynb>`_
       *  Torchvision ResNet50 tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>`

   * - resnet50
     - tensorflow-neuronx
     - * :ref:`tensorflow-servingx-neuronrt-visible-cores`

   * - unet
     - torch-neuronx
     - * `Pretrained UNet Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/pretrained_unet_inference_on_trn1.ipynb>`_

   * - vgg
     - torch-neuronx
     - * `Torchvision Pretrained VGG Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/tv_pretrained_vgg_inference_on_trn1.ipynb>`_


.. _sd_model_samples_inference_inf2_trn1:

Stable Diffusion
----------------

.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - stable-diffusion-v1-5
     - torch-neuronx
     - * `HuggingFace Stable Diffusion 1.5 (512x512) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd15_512_inference.ipynb>`_

   * - stable-diffusion-2-1-base
     - torch-neuronx
     - * `HuggingFace Stable Diffusion 2.1 (512x512) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd2_512_inference.ipynb>`_

   * - stable-diffusion-2-1
     - torch-neuronx
     - * `HuggingFace Stable Diffusion 2.1 (768x768) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd2_768_inference.ipynb>`_
       * `Deploy & Run Stable Diffusion on SageMaker and Inferentia2 <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/blob/master/inference/stable-diffusion/StableDiffusion2_1.ipynb>`_

   * - stable-diffusion-xl-base-1.0
     - torch-neuronx
     - * `HuggingFace Stable Diffusion XL 1.0 (1024x1024) Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sdxl_base_1024_inference.ipynb>`_
       * `HuggingFace Stable Diffusion XL 1.0 Base and Refiner (1024x1024) Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sdxl_base_and_refiner_1024_inference.ipynb>`_

   * - stable-diffusion-2-inpainting
     - torch-neuronx
     - * `stable-diffusion-2-inpainting model Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/archive/torch-neuronx/inference/hf_pretrained_sd2_inpainting_936_624_inference.ipynb>`_


.. _diffusion_transformers_samples_inference_inf2_trn1:

Diffusion Transformers
----------------------

.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials

   * - pixart-alpha
     - torch-neuronx
     - * `HuggingFace PixArt Alpha (256x256, 512x512 square resolution) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_pixart_alpha_inference_on_inf2.ipynb>`_

   * - pixart-sigma
     - torch-neuronx
     - * `HuggingFace PixArt Sigma (256x256, 512x512 square resolution) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_pixart_sigma_inference_on_inf2.ipynb>`_

   
.. _audio_model_samples_inference_inf2_trn1:

Audio
-----

.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size

   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials
       
   * - wav2vec2-conformer
     - torch-neuronx
     - * `Run HuggingFace Pretrained Wav2Vec2-Conformer with Rotary Position Embeddings Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_rope_inference_on_inf2.ipynb>`_
       * `Run HuggingFace Pretrained Wav2Vec2-Conformer with Relative Position Embeddings Inference on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_relpos_inference_on_inf2.ipynb>`_


.. _multi_modal_model_samples_inference_inf2_trn1:

Multi Modal
-----------

.. list-table::
   :widths: 20 15 45 
   :header-rows: 1
   :align: left
   :class: table-smaller-font-size


   * - Model
     - Frameworks/Libraries
     - Samples and Tutorials
       

   * - multimodal-perceiver
     - torch-neuronx
     - * `HuggingFace Multimodal Perceiver Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_perceiver_multimodal_inference.ipynb>`_


   * - language-perceiver
     - torch-neuronx
     - * `HF Pretrained Perceiver Language Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_perceiver_language_inference.ipynb>`_


   * - vision-perceiver-conv
     - torch-neuronx
     - * `HF Pretrained Perceiver Image Classification Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_perceiver_vision_inference.ipynb>`_