This document is relevant for: Inf1

Deploy Neuron Container on Elastic Container Service (ECS) for Inference#

Description#

Neuron developer flow for DLC on ECS

You can use the Neuron version of the AWS Deep Learning Containers to run inference on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with inf1/inf2 instances, create a task description for your inference service and deploy it to your cluster. This developer flow assumes:

  1. The model has already been compiled through Compilation with Framework API on EC2 instance or through Compilation with Sagemaker Neo.

  2. You already set up your container to retrieve it from storage.

Setup Environment#

  1. Set up an Amazon ECS cluster:

    Follow the instructions on Setting up Amazon ECS for Deep Learning Containers

  2. Define an Inference Task:

    Use the instruction on the DLC Inference on ECS Tutorial to define a task and create a service for the appropriate framework.

    When creating tasks for inferentia instances on ECS, be aware of the considerations and requirements listed in Working with inference workloads on Amazon ECS.

  3. Use the container image created using Tutorial How to Build and Run a Neuron Container as the image in your task definition.

    Note

    Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to Pushing a Docker image for more information.

This document is relevant for: Inf1