This document is relevant for: Inf1
Deploy Neuron Container on Elastic Container Service (ECS) for Inference#
Description#
You can use the Neuron version of the AWS Deep Learning Containers to run inference on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with inf1/inf2 instances, create a task description for your inference service and deploy it to your cluster. This developer flow assumes:
The model has already been compiled through Compilation with Framework API on EC2 instance or through Compilation with Sagemaker Neo.
You already set up your container to retrieve it from storage.
Setup Environment#
- Set up an Amazon ECS cluster:
Follow the instructions on Setting up Amazon ECS for Deep Learning Containers
- Define an Inference Task:
Use the instruction on the DLC Inference on ECS Tutorial to define a task and create a service for the appropriate framework.
When creating tasks for inferentia instances on ECS, be aware of the considerations and requirements listed in Working with inference workloads on Amazon ECS.
Use the container image created using Tutorial How to Build and Run a Neuron Container as the
image
in your task definition.Note
Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to Pushing a Docker image for more information.
This document is relevant for: Inf1