Amazon EKS#

Deploy containerized Neuron workloads on Amazon Elastic Kubernetes Service. EKS provides managed Kubernetes with Neuron device plugins, topology-aware scheduling, health monitoring, and Dynamic Resource Allocation for Trainium and Inferentia instances.

Get started#

Set up EKS for Neuron

Create an EKS cluster with Neuron nodes, install the device plugin, configure the scheduler extension, and verify resource allocation.

Neuron Helm chart

Install device plugins, scheduler extensions, node problem detector, and DRA driver with a single Helm command.

Run workloads#

Run inference on EKS

Deploy inference containers on EKS using Neuron Deep Learning Containers on Inferentia and Trainium instances.

Run training on EKS

Deploy distributed training workloads on EKS with Trainium instances and Neuron DLCs.

Advanced topics#

Dynamic Resource Allocation (DRA)

Use Kubernetes DRA for attribute-based device selection and topology-aware allocation on K8s 1.34+.

Neuron UltraServer Operator (Beta)

Topology-aware provisioning and lifecycle management of Neuron UltraServer workloads on EKS, built on the Neuron DRA driver.

Schedule MPI jobs on UltraServers

Run MPI jobs across Trn2 UltraServer nodes in EKS for multi-node inference and training.

EKS prerequisites

Detailed prerequisites for setting up an EKS cluster with Neuron support.