This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Amazon ECS#

In this section, you’ll find resources to help you use Neuron with ECS cluster, deploying inference and training workloads on Inferentia and Trainium ECS clusters.

Using Neuron Node Problem Detector Plugin with ECS#

Neuron node problem detector and recovery plugin enhances resiliency by detecting and remediating errors. To get started with using Neuron node problem detector plugin and recovery plugin on an ECS cluster, please refer to Neuron Problem Detector And Recovery.

Running Inference workload#

This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an ECS cluster with Inferentia instances. For running machine learning inference workloads on Amazon ECS using AWS Deep Learning Containers, please refer to Deploy Neuron Container on Elastic Container Service (ECS) for Inference.

Running Training workload#

This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an ECS cluster with Trainium instances. For running machine learning training workloads on Amazon ECS using AWS Deep Learning Containers, please refer to Deploy Neuron Container on Elastic Container Service (ECS) for Training.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n