This document is relevant for: Trn1
, Trn2
Deploy Neuron Container on Elastic Container Service (ECS) for Training#
Description#
You can use the Neuron version of the AWS Deep Learning Containers to run training on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with trn1 instances, create a task description for your training container and deploy it to your cluster. This developer flow assumes:
The model has already been compiled through Compilation with Framework API on EC2 instance or through Compilation with Sagemaker Neo.
You already set up your container to retrieve it from storage.
Setup Environment#
- Set up an Amazon ECS cluster:
Follow the instructions on Setting up Amazon ECS for Deep Learning Containers
- Define a Training Task:
Use the instruction on the DLC Training on ECS Tutorial to define a task and create a service for the appropriate framework.
When creating tasks for trn1 instances on ECS, be aware of the considerations and requirements listed in Working with training workloads on Amazon ECS.
Use the container image created using Tutorial How to Build and Run a Neuron Container as the
image
in your task definition.Note
Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to Pushing a Docker image for more information.
This document is relevant for: Trn1
, Trn2