This document is relevant for: Inf1, Inf2, Trn1, Trn2

Amazon EKS#

In this section, you’ll find resources to help you use Neuron with EKS cluster, deploying inference and training workloads on Inferentia and Trainium EKS clusters.

EKS Setup#

This guide covers setting up the Neuron device plugin, scheduler extension, node problem detector, and monitoring plugins. These components enable efficient resource utilization, monitoring, and resilience when using Inferentia and Trainium instances for inference and training workloads on Kubernetes clusters. To get started with using AWS Neuron and setting up the required plugins on an EKS cluster, please refer to EKS Setup For Neuron.

Running Inference workload#

This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an EKS cluster with Inferentia instances. For running machine learning inference workloads on Amazon EKS using AWS Deep Learning Containers, please refer to Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference.

Running Training workload#

This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an EKS cluster with Trainium instances. For running machine learning training workloads on Amazon EKS using AWS Deep Learning Containers, please refer to Deploy a simple mlp training script as a Kubernetes job.

This document is relevant for: Inf1, Inf2, Trn1, Trn2

Amazon EKS

Contents

Amazon EKS#

EKS Setup#

Running Inference workload#

Running Training workload#