Choose your deployment path#

AWS Neuron supports multiple deployment configurations for training and inference on Trainium and Inferentia instances. This page helps you choose the right combination of environment, compute service, and infrastructure based on your workload requirements.

Quick start: which path is right for you?#

I want to…	Recommended path	Get started
Prototype a model on a single instance	DLAMI on EC2	Neuron DLAMI User Guide
Serve an LLM with vLLM	vLLM DLC on EC2 or EKS	Quickstart: Configure and deploy a vLLM server using Neuron Deep Learning Container (DLC)
Run distributed training across multiple nodes	DLC on EKS or ParallelCluster	Amazon EKS
Run batch training jobs on a schedule	DLC on AWS Batch	AWS Batch
Use managed infrastructure with minimal setup	Amazon SageMaker	Amazon SageMaker
Build a production Kubernetes inference service	DLC on EKS with DRA	AWS Neuron Dynamic Resource Allocation (DRA)
Run containerized tasks without Kubernetes	DLC on ECS	Amazon ECS

Choose your environment #

Your first decision is how you want the Neuron SDK installed. Neuron provides three pre-configured environment types, each suited to different workflows.

Deep Learning AMIs (DLAMIs)#

Use when: You want the fastest path to running code on a single EC2 instance. DLAMIs come with Neuron drivers, frameworks, and virtual environments pre-installed. Launch an instance and start working in minutes.

Best for:

Interactive development with SSH or Jupyter notebooks
Prototyping and experimentation on a single instance
Teams that want pre-configured virtual environments for PyTorch, JAX, or vLLM

DLAMI types:

Multi-Framework DLAMI — includes PyTorch, JAX, vLLM, and NxD libraries in separate virtual environments. Use this when you want to explore multiple frameworks or switch between training and inference workflows.
Single Framework DLAMI — optimized for one framework version. Use this for production deployments where you know exactly which framework you need.
Base DLAMI — includes only Neuron drivers, EFA, and tools. Use this as a foundation for containerized applications or custom builds where you install your own packages.

Get started: Neuron DLAMI User Guide

Deep Learning Containers (DLCs)#

Use when: You need portable, reproducible environments for orchestrated deployments. DLCs are Docker images pre-built with Neuron SDK and a specific framework, available in Amazon ECR.

Best for:

Production deployments on EKS, ECS, or AWS Batch
CI/CD pipelines that require consistent environments
Multi-node distributed training where each node runs the same container
vLLM inference serving in containerized environments

Available containers:

PyTorch Training, PyTorch Inference, PyTorch vLLM Inference, JAX Training

Get started: Neuron Deep Learning Containers | Quickstart: Configure and deploy a vLLM server using Neuron Deep Learning Container (DLC)

Custom Docker containers #

Use when: You need full control over the container environment — custom dependencies, specific package versions, or a CI/CD pipeline that builds images from scratch.

Best for:

Teams with existing Docker build pipelines
Workloads requiring packages not included in DLCs
Environments with strict security or compliance requirements

Get started: Getting started with Neuron DLC using Docker | Customize Neuron DLC

Choose your compute service #

Your second decision is where to run your workload. Each AWS compute service offers different trade-offs between control, automation, and operational overhead.

Amazon EC2 (direct instance access)#

Use when: You want direct access to Neuron hardware on a single instance or a small number of instances. EC2 gives you full control over the instance lifecycle.

Best for:

Development and prototyping
Single-node training and inference
Interactive debugging with SSH access
Running Jupyter notebooks

Typical workflow: Launch a DLAMI, SSH in, activate a virtual environment, run your code.

Get started: Amazon EC2

Amazon EKS (Kubernetes orchestration)#

Use when: You need Kubernetes-based orchestration for containerized Neuron workloads. EKS provides device plugins, topology-aware scheduling, health monitoring, and Dynamic Resource Allocation (DRA) for Neuron devices.

Best for:

Production inference services with auto-scaling
Multi-node distributed training with EFA networking
Teams already using Kubernetes for workload management
Workloads requiring topology-aware device allocation (DRA)
Multi-node inference on Trn2 UltraServers

Key capabilities:

Neuron device plugin — exposes Neuron hardware to the Kubernetes scheduler
Neuron Helm chart — installs all infrastructure components with a single command (Components Included)
Dynamic Resource Allocation (DRA) — attribute-based device selection and topology-aware allocation on K8s 1.34+ (AWS Neuron Dynamic Resource Allocation (DRA))
UltraServer support — schedule MPI jobs across Trn2 UltraServer nodes (How to schedule MPI jobs to run on Neuron UltraServer on EKS)
Node problem detector — automatic health monitoring and node replacement (Deploy Neuron Node Problem Detector and Recovery)

Get started: Amazon EKS | Kubernetes environment setup for Neuron

Amazon ECS (container task orchestration)#

Use when: You want container orchestration without Kubernetes. ECS provides task-based scheduling for Neuron containers with simpler operational overhead than EKS.

Best for:

Teams already using ECS for container workloads
Simpler container deployments that don’t need Kubernetes features
Workloads where task-based scheduling is sufficient

Get started: Amazon ECS

AWS Batch (batch job scheduling)#

Use when: You have training jobs that run on a schedule or in response to events, and you want AWS to manage compute scaling automatically.

Best for:

Periodic or scheduled training jobs
Workloads with variable compute demand
Teams that want automatic resource provisioning and cleanup

Get started: AWS Batch

AWS ParallelCluster (HPC with Slurm)#

Use when: You need an HPC cluster with Slurm for large-scale distributed training. ParallelCluster manages the cluster lifecycle including head nodes, compute fleets, and shared storage.

Best for:

Large-scale distributed training across many Trn1 nodes
Teams familiar with Slurm job scheduling
Workloads requiring shared filesystems (EFS, FSx)

Get started: AWS ParallelCluster

Amazon SageMaker (managed ML platform)#

Use when: You want a fully managed ML platform that handles infrastructure provisioning, training orchestration, and model deployment. SageMaker abstracts away the compute management.

Best for:

Teams that prefer managed services over self-managed infrastructure
End-to-end ML workflows (data preparation → training → deployment)
Fine-tuning foundation models with SageMaker JumpStart
Resilient training with SageMaker HyperPod (automatic checkpointing and recovery)

Get started: Amazon SageMaker

Common deployment patterns #

Single-instance development and prototyping #

Launch a Multi-Framework DLAMI on an Inf2 or Trn1 instance. Activate the virtual environment for your framework and iterate on your model. This is the fastest path from zero to running code.

vLLM inference serving #

Deploy a vLLM DLC on EC2 for single-instance serving, or on EKS for production with auto-scaling. The vLLM DLC includes the Neuron vLLM plugin with continuous batching, speculative decoding, and OpenAI-compatible APIs.

Quickstart: Deploy a DLC with vLLM
For production: Deploy on EKS with DRA for topology-aware device allocation

Multi-node distributed training #

Use DLCs on EKS or ParallelCluster for distributed training across multiple Trn1 or Trn2 nodes with EFA networking.

EKS path: Set up EKS → Deploy training
ParallelCluster path: Set up ParallelCluster
Batch path: Train on AWS Batch

Production Kubernetes inference with DRA #

For production inference on EKS with Trn2 instances, use Dynamic Resource Allocation (DRA) for topology-aware device scheduling. DRA replaces the need for custom scheduler extensions and enables attribute-based device selection.

Set up EKS with Helm chart
Configure DRA for topology-aware allocation
For UltraServer workloads: Schedule MPI jobs

Choose your deployment path

Contents

Choose your deployment path#

Quick start: which path is right for you?#

Choose your environment #

Deep Learning AMIs (DLAMIs)#

Deep Learning Containers (DLCs)#

Custom Docker containers #

Choose your compute service #

Amazon EC2 (direct instance access)#

Amazon EKS (Kubernetes orchestration)#

Amazon ECS (container task orchestration)#

AWS Batch (batch job scheduling)#

AWS ParallelCluster (HPC with Slurm)#

Amazon SageMaker (managed ML platform)#

Common deployment patterns #

Single-instance development and prototyping #

vLLM inference serving #

Multi-node distributed training #

Production Kubernetes inference with DRA #

Further reading #