Bring Your Own Neuron Container to Sagemaker Hosting (inf2 or trn1)#

Table of Contents


Neuron developer flow on SageMaker Neo

You can use a SageMaker Notebook or an EC2 instance to compile models and build your own containers for deployment on SageMaker Hosting using ml.inf2 instances. In this developer flow, you provision a Sagemaker Notebook or an EC2 instance to train and compile your model to Inferentia. Then you deploy your model to SageMaker Hosting using the SageMaker Python SDK.

You may not need to create a container to bring your own code to Amazon SageMaker. When you are using a framework such as TensorFlow or PyTorch that has direct support in SageMaker, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework.

Follow the steps bellow to setup your environment. Once your environment is set you’ll be able to follow the Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker Sample.

Setup Environment#

  1. Create a Compilation Instance:

    If using an EC2 instance for compilation only you can use any instances to compile a model. It is recommended that you start with an c5.4xlarge instance. If using an EC2 instance for compilation and test a model you can use an Inf2 instance. Follow these steps to launch an Inf2 instance:

    • Please follow the instructions at launch an Amazon EC2 Instance to launch an Inf2 instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type. To get more information about Inf2 instances sizes and pricing see Inf2 web page.

    • When choosing an Amazon Machine Image (AMI) make sure to select Deep Learning AMI with Conda Options. Please note that Neuron Conda environments are supported only in Ubuntu 18 DLAMI and Amazon Linux2 DLAMI, Neuron Conda environments are not supported in Amazon Linux DLAMI.

    • After launching the instance, follow the instructions in Connect to your instance to connect to the instance


    You can also launch the instance from AWS CLI, please see AWS CLI commands to launch inf2 instances.

    If using an SageMaker Notebook for compilation, follow the instructions in Get Started with Notebook Instances to provision the environment.

    It is recommended that you start with an ml.c5.4xlarge instance for the compilation. Also, increase the volume size of you SageMaker notebook instance, to accomodate the models and containers built locally. A volume of 10GB is sufficient.


    To compile the model in the SageMaker Notebook instance, you’ll need to install the Neuron Compiler and Neuron Framework Extensions. Follow the Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker Sample to install the environments.

  2. Set up the environment to compile a model, build your own container and deploy:

    To compile your model on EC2 or SageMaker Notebook, follow the Set up a development environment section on the EC2 Setup Environment documentation.

    Refer to Adapting Your Own Inference Container documentation for information on how to bring your own containers to SageMaker Hosting.

    Make sure to add the AmazonEC2ContainerRegistryPowerUser role to your IAM role ARN, so you’re able to build and push containers from your SageMaker Notebook instance.


    The container image can be created using Tutorial How to Build and Run a Neuron Container.