This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Model Architecture Fit Guidelines#

Table of contents

Introduction
Model Support Overview
- AWS Trainium and AWS Inferentia2 (NeuronCore-v2)
- AWS Inferentia (NeuronCore v1)
Clarifications on Inferentia (1st generation) Model Architecture

Introduction #

AWS Neuron SDK enables you to train and deploy a wide range of deep learning models on EC2 Inf1, EC2 inf2, EC2 Trn1 and EC2 Trn1n instances, which are powered by Inferentia, Inferentia2 and Trainium devices. The below table provides details of the NeuronDevices and NeuronCores enabling each instance:

Instance	NeuronDevices	NeuronCores	# NeuronCores in a NeuronDevice
EC2 Trn1	16 x Trainium	32 x NeuronCore-v2	2
EC2 Trn1n	16 x Trainium	32 x NeuronCore-v2	2
EC2 inf2	12 x Inferentia2	24 x NeuronCore-v2	2
EC2 Inf1	16 x Inferentia	64 x NeuronCore-v1	4

This document describes what types of deep learning model architectures are a good fit for Inferentia, Inferentia2 and Trainium powered instances.

Model Support Overview #

AWS Trainium and AWS Inferentia2 (NeuronCore-v2)#

Last update - 02/25/2023

Model Family/ Neural Network Architecture	Category	Hardware Architecture	Training with PyTorch Neuron (`torch-neuronx`)	Inference with PyTorch Neuron (`torch-neuronx`)	Inference with TensorFlow Neuron (`tensorflow-neuronx`)
Transformer Encoders	NLP	Good Fit	Supported	Supported	Supported
Transformer Decoders	NLP	Good Fit	Supported	Supported	Roadmap Item
Transformer Encoder-Decoder (Sequence-to-sequence)	NLP	Good Fit	Supported	Roadmap Item	Roadmap Item
LSTMs	NLP and Computer Vision	Good Fit	Roadmap Item	Roadmap Item	Roadmap Item
Vision Transformer	Computer Vision	Good Fit	Supported	Roadmap Item	Roadmap Item
Diffusion models	Computer Vision	Good Fit	Roadmap Item	Roadmap Item	Roadmap Item
Convolutional Neural Network (CNN) models	Computer Vision	Good Fit	Roadmap Item	Supported	Roadmap Item
R-CNNs	Computer Vision	Good Fit	Roadmap Item	Roadmap Item	Roadmap Item

Note

Supported means that at least a single model of the model family or the neural-network architecture already enabled.

AWS Inferentia (NeuronCore v1)#

Last update - 10/10/2022

Model Family/ Neural Network Architecture	Category	Hardware Architecture	PyTorch Neuron (`torch-neuron`)	TensorFlow Neuron (`tensorflow-neuron (TF 1.x)`)	TensorFlow Neuron (`tensorflow-neuron (TF 2.x)`)
Transformer Encoders	NLP	Good Fit	Supported	Supported	Supported
Transformer Decoders	NLP	Not a Good Fit	NA	NA	NA
Transformer Encoder-Decoder (Sequence-to-sequence)	NLP	Not a Good Fit	NA	NA	NA
LSTMs	NLP and Computer Vision	Good Fit	Supported	NA	NA
Vision Transformer	Computer Vision	Good Fit	Roadmap Item	Roadmap Item	Roadmap Item
Diffusion models	Computer Vision	Good Fit	Roadmap Item	NA	NA
Convolutional Neural Network (CNN) models	Computer Vision	Good Fit	Supported	Supported	Roadmap Item
R-CNNs	Computer Vision	Supported with limitations	Supported with limitations	NA	NA

Note

Supported means that at least a single model of the model family or the neural-network architecture already enabled.

Clarifications on Inferentia (1st generation) Model Architecture #

Natural Language Processing (NLP) Models with Transformer #

Transformer Encoders#

Autoencoding models use only the encoder part of the Transformer architecture. Representatives of this family include models like BERT, distilBERT, XLM-BERT, Roberta, BioBert, etc. Since the encoding process in these models can be parallelized, you can expect these models to run well both on Inferentia and Trainium.

Architecture Fit - Autoencoding models are a good fit for Inferentia.
Neuron Support - Neuron SDK support running Autoencoding models for inference on Inferentia. Please see benchmark results of these models. To get started with NLP models you can refer to Neuron PyTorch, TensorFlow and MXNet NLP tutorials.

Decoder models, or autoregressive models with Transformer#

Autoregressive models keep only the decoder part of the Transformer architecture. Representatives of this family include models like GPT-3, GPT-2, etc.

Architecture Fit - Autoregressive models are not a good fit for Inferentia. Usually the decoder part in these models is the most significant performance bottleneck since it must be executed once per output token, causing frequent access to the memory. Due to this these models typically experience the best performance only when the decoder maximum sequence length is short (e.g., 128).
Neuron Support - Neuron SDK does not support Autoregressive models inference on Inferentia.

Encoder-decoder models, or sequence-to-sequence models with Transformer#

Sequence-to-sequence models use both of encoder and decoder of the Transformer architecture. Representatives of this family include models like T5, Bart, Marian MT, etc.

Architecture Fit - Sequence-to-sequence models are not a good fit for Inferentia. Like decoder models explained above, usually the decoder part in these sequence-to-sequence models is the most significant performance bottleneck since it must be executed once per output token, causing frequent access to the memory. Due to this, even when you enabled the models to run on Inferentia with wrapping the decoder part, these models typically experience the best performance only when the decoder maximum sequence length is short (e.g., 128).
Neuron Support - Neuron SDK does not support sequence-to-sequence models inference on Inferentia out of the box. However, you can run a model with defining wrappers around the encoder and decoder portions of it. For example, please refer to MarianMT tutorial on Inferentia for more details.

Computer Vision Models #

Convolutional Neural Network (CNN) based models#

CNN based models are used for applications in image classification and object detection. Representatives of this family include models like ResNet, ResNext, VGG, YOLO, SSD, etc.

Architecture Fit - CNN based models are a good fit for Inferentia.
Neuron Support - Neuron SDK supports CNN based models inference on Inferentia. Please see the benchmark results of these models. To get started with these models you can refer to Neuron PyTorch, TensorFlow and MXNet tutorials.

Region-based CNN (R-CNN) models#

Region-based CNNs (R-CNNs) models are commonly used for object detection and image segmentation tasks. Popular variants of the the R-CNN model include R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN.

Architecture Fit - R-CNN models can have a few limitations and considerations on Inferentia: RoI Align operators: At this time, RoI Align operators typically cannot run efficiently on NeuronCore v1. As a result, RoI Align operators are mapped directly to CPU during compilation. R-CNN models that predict a low number of bounding boxes (<100) experience the best performance on Inferentia. Large ResNet backbone: R-CNNs that have a large ResNet backbone (such as ResNet-50 or ResNet-101) experience the greatest performance improvement on Inferentia because a larger portion of the R-CNN compute is accelerated.
Neuron Support - Torch models must be traceable using torch.jit.trace() for compilation on Inferentia. Most Detectron2-based R-CNNs are not jit traceable by default, so they cannot readily be compiled for optimized inference on Inferentia. The Running R-CNNs on Inf1 application note demonstrates how to compile and improve the performance of R-CNN models on Inferentia. It also provides an end-to-end example of running a Detectron2 R-CNN on Inferentia.

Models with Long Short-Term Memory (LSTM) networks #

LSTMs use an internal state to process sequential data. LSTMs are commonly used to model temporal sequences of data in language processing and computer vision applications.

Architecture Fit - Models with LSTM cells are a good fit for Inferentia.
Neuron Support - Models with LSTM networks are supported on Inferentia, please see Developer Guide - PyTorch Neuron (torch-neuron) LSTM Support.

Diffusion Models #

Architecture Fit - Diffusion models are a good fit for Inferentia.
Neuron Support - Diffusion models are not supported on Inferentia as of the latest Neuron release. Please track the Neuron Roadmap for details.

Known Issues on Inferentia (NeuronCore v1)#

Support of large models (impacts torch-neuron and tensorflow-neuron (TF1.x))#

During compilation on Inferentia (NeuronCore v1), torch-neuron and tensorflow-neuron (TF1.x) export a protobuf that contains the model’s graph structure and weights. This causes an issue when the total size of the model’s weights exceeds the 2GB limitation of protobufs. As a result, customers who want to run large models such as RegNet, Stable Diffusion, and t5-11b might run into protobuf errors during compilation.

This is a known issue related to the compilation process, not a hardware-dependent issue. Allowing large models like this to be compiled for inference on Inferentia (NeuronCore v1) is a feature that we intend to address in a future release. Please track the Neuron Roadmap for details.

Note

Neuron release 2.5.0 added Experimental support for tracing models larger than 2GB in `tensorflow-neuron (TF2.x)`, please see extract-weights flag in TensorFlow 2.x (tensorflow-neuron) Tracing API

This document is relevant for: Inf1, Inf2, Trn1, Trn1n

AWS Neuron Documentation

Model Architecture Fit Guidelines

Contents

Model Architecture Fit Guidelines#

Introduction #

Model Support Overview #

AWS Trainium and AWS Inferentia2 (NeuronCore-v2)#

AWS Inferentia (NeuronCore v1)#

Clarifications on Inferentia (1st generation) Model Architecture #

Natural Language Processing (NLP) Models with Transformer #

Transformer Encoders#

Decoder models, or autoregressive models with Transformer#

Encoder-decoder models, or sequence-to-sequence models with Transformer#

Computer Vision Models #

Convolutional Neural Network (CNN) based models#

Region-based CNN (R-CNN) models#

Models with Long Short-Term Memory (LSTM) networks #

Diffusion Models #

Known Issues on Inferentia (NeuronCore v1)#

Support of large models (impacts torch-neuron and tensorflow-neuron (TF1.x))#