This document is relevant for: Inf2, Trn1

Trn2)#

Encoders #

Model	Frameworks/Libraries	Samples and Tutorials
bert-base-cased-finetuned-mrpc	torch-neuronx	BERT TorchServe tutorial HuggingFace pretrained BERT tutorial [html] [notebook] LibTorch C++ Tutorial for HuggingFace Pretrained BERT Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker
bert-base-cased-finetuned-mrpc	neuronx-distributed	Inference with Tensor Parallelism [Beta]
bert-base-uncased	torch-neuronx	HuggingFace Pretrained BERT Inference on Trn1
distilbert-base-uncased	torch-neuronx	HuggingFace Pretrained DistilBERT Inference on Trn1
roberta-base	tensorflow-neuronx	HuggingFace Roberta-Base [html] [notebook]
roberta-large	torch-neuronx	HuggingFace Pretrained RoBERTa Inference on Trn1

Decoders #

Model	Frameworks/Libraries	Samples and Tutorials
gpt2	torch-neuronx	HuggingFace Pretrained GPT2 Feature Extraction on Trn1
meta-llama/Llama-3.3-70B	neuronx-distributed-inference	Tutorial: Using Speculative Decoding to improve Llama-3.3-70B inference performance on Trn2 instances Tutorial: Scaling LLM Inference with Data Parallelism on Trn2
meta-llama/Llama-3.2-11B-Vision-Instruct	neuronx-distributed-inference	Tutorial for deploying Llama3.2 Multimodal Models on Trn1 & Inf2 instances
meta-llama/Llama-3.2-90B-Vision-Instruct	neuronx-distributed-inference	Tutorial for deploying Llama3.2 Multimodal Models on Trn1 & Inf2 instances
meta-llama/Llama-3.1-8b	transformers-neuronx	Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 32k sequence length Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 128k sequence length Run meta-llama/Meta-Llama-3.1-8B autoregressive sampling on Inf2 & Trn1
meta-llama/Llama-3.1-70b	transformers-neuronx	Run Hugging Face Llama 3.1 70B autoregressive sampling on Trn1 with 64k sequence length Run Hugging Face meta-llama/Meta-Llama-3.1-70B autoregressive sampling on Inf2 & Trn1
meta-llama/Llama-3.1-70b-Instruct	transformers-neuronx	Run Hugging Face Llama-3.1-70B-Instruct + Llama-3.2-1B-Instruct Speculative Decoding on Trn1 with transformers-neuronx and vLLM Run Hugging Face Llama-3.1-70B-Instruct EAGLE Speculative Decoding on Trn1 with transformers-neuronx and vLLM
meta-llama/Llama-3.1-405b	neuronx-distributed-inference	Tutorial for deploying Llama-3.1-405B on Trn2 Tutorial: Using Speculative Decoding and Quantization to improve Llama-3.1-405B inference performance on Trn2 instances
meta-llama/Llama-3.1-405b	transformers-neuronx	Run Hugging Face Llama 3.1 405B autoregressive sampling on Trn1/Trn1n with 16k sequence length
meta-llama/Llama-3-8b	transformers-neuronx	Run Hugging Face meta-llama/Llama-3-8b autoregressive sampling on Inf2 & Trn1
meta-llama/Llama-3-70b	transformers-neuronx	Run Hugging Face meta-llama/Llama-3-70b autoregressive sampling on Inf2 & Trn1
meta-llama/Llama-2-13b	transformers-neuronx	Run Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1
meta-llama/Llama-2-70b	transformers-neuronx	Run Hugging Face meta-llama/Llama-2-70b autoregressive sampling on Inf2 & Trn1 Run speculative sampling on Meta Llama models [Beta]
meta-llama/Llama-3.2-1B-Instruct	neuronx-distributed	Run meta-llama/Llama-3.2-1B-Instruct on Inf2 and Trn1
meta-llama/codellama-13b	neuronx-distributed	Run meta-llama/codellama-13b-16k-sampling
mistralai/Mistral-7B-Instruct-v0.1	transformers-neuronx	Run Mistral-7B-Instruct-v0.1 autoregressive sampling on Inf2 & Trn1
mistralai/Mistral-7B-Instruct-v0.2	transformers-neuronx	Run Hugging Face mistralai/Mistral-7B-Instruct-v0.2 autoregressive sampling on Inf2 & Trn1 [Beta]
Mixtral-8x7B-v0.1	transformers-neuronx	Run Hugging Face mistralai/Mixtral-8x7B-v0.1 autoregressive sampling on Inf2 & Trn1
Mixtral-8x7B	neuronx-distributed	Mixtral inference with NeuronX Distributed on Inf2 & Trn1
DBRX	neuronx-distributed	DBRX inference with NeuronX Distributed on Inf2 & Trn1
codellama/CodeLlama-13b-hf	transformers-neuronx	Run Hugging Face codellama/CodeLlama-13b-hf autoregressive sampling on Inf2 & Trn1

Encoder-Decoders #

Model	Frameworks/Libraries	Samples and Tutorials
t5-large	torch-neuronx optimum-neuron	T5 inference tutorial [html] [notebook]
t5-3b	neuronx-distributed	T5 inference tutorial [html] [notebook]
google/flan-t5-xl	neuronx-distributed	flan-t5-xl inference tutorial [html] [notebook]

Vision Transformers #

Model	Frameworks/Libraries	Samples and Tutorials
google/vit-base-patch16-224	torch-neuronx	HuggingFace Pretrained ViT Inference on Trn1
clip-vit-base-patch32	torch-neuronx	HuggingFace Pretrained CLIP Base Inference on Inf2
clip-vit-large-patch14	torch-neuronx	HuggingFace Pretrained CLIP Large Inference on Inf2

Convolutional Neural Networks(CNN)#

Model	Frameworks/Libraries	Samples and Tutorials
resnet50	torch-neuronx	Torchvision Pretrained ResNet50 Inference on Trn1 / Inf2 Torchvision ResNet50 tutorial [html] [notebook]
resnet50	tensorflow-neuronx	Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving
unet	torch-neuronx	Pretrained UNet Inference on Trn1 / Inf2
vgg	torch-neuronx	Torchvision Pretrained VGG Inference on Trn1 / Inf2

Stable Diffusion #

Model	Frameworks/Libraries	Samples and Tutorials
stable-diffusion-v1-5	torch-neuronx	HuggingFace Stable Diffusion 1.5 (512x512) Inference on Trn1 / Inf2
stable-diffusion-2-1-base	torch-neuronx	HuggingFace Stable Diffusion 2.1 (512x512) Inference on Trn1 / Inf2
stable-diffusion-2-1	torch-neuronx	HuggingFace Stable Diffusion 2.1 (768x768) Inference on Trn1 / Inf2 Deploy & Run Stable Diffusion on SageMaker and Inferentia2
stable-diffusion-xl-base-1.0	torch-neuronx	HuggingFace Stable Diffusion XL 1.0 (1024x1024) Inference on Inf2 HuggingFace Stable Diffusion XL 1.0 Base and Refiner (1024x1024) Inference on Inf2
stable-diffusion-2-inpainting	torch-neuronx	stable-diffusion-2-inpainting model Inference on Trn1 / Inf2

Diffusion Transformers #

Model	Frameworks/Libraries	Samples and Tutorials
pixart-alpha	torch-neuronx	HuggingFace PixArt Alpha (256x256, 512x512 square resolution) Inference on Trn1 / Inf2
pixart-sigma	torch-neuronx	HuggingFace PixArt Sigma (256x256, 512x512 square resolution) Inference on Trn1 / Inf2

Audio #

Model	Frameworks/Libraries	Samples and Tutorials
wav2vec2-conformer	torch-neuronx	Run HuggingFace Pretrained Wav2Vec2-Conformer with Rotary Position Embeddings Inference on Inf2 Run HuggingFace Pretrained Wav2Vec2-Conformer with Relative Position Embeddings Inference on Inf2 & Trn1

Multi Modal #

Model	Frameworks/Libraries	Samples and Tutorials
multimodal-perceiver	torch-neuronx	HuggingFace Multimodal Perceiver Inference on Trn1 / Inf2
language-perceiver	torch-neuronx	HF Pretrained Perceiver Language Inference on Trn1 / Inf2
vision-perceiver-conv	torch-neuronx	HF Pretrained Perceiver Image Classification Inference on Trn1 / Inf2

This document is relevant for: Inf2, Trn1

Inference Samples/Tutorials (Inf2/Trn1/Trn2)

Contents

Inference Samples/Tutorials (Inf2/Trn1/Trn2)#