Choose your deployment path#
AWS Neuron supports multiple deployment configurations for training and inference on Trainium and Inferentia instances. This page helps you choose the right combination of environment, compute service, and infrastructure based on your workload requirements.
Quick start: which path is right for you?#
I want to… |
Recommended path |
Get started |
|---|---|---|
Prototype a model on a single instance |
DLAMI on EC2 |
|
Serve an LLM with vLLM |
vLLM DLC on EC2 or EKS |
Quickstart: Configure and deploy a vLLM server using Neuron Deep Learning Container (DLC) |
Run distributed training across multiple nodes |
DLC on EKS or ParallelCluster |
|
Run batch training jobs on a schedule |
DLC on AWS Batch |
|
Use managed infrastructure with minimal setup |
Amazon SageMaker |
|
Build a production Kubernetes inference service |
DLC on EKS with DRA |
|
Run containerized tasks without Kubernetes |
DLC on ECS |
Choose your environment#
Your first decision is how you want the Neuron SDK installed. Neuron provides three pre-configured environment types, each suited to different workflows.
Deep Learning AMIs (DLAMIs)#
Use when: You want the fastest path to running code on a single EC2 instance. DLAMIs come with Neuron drivers, frameworks, and virtual environments pre-installed. Launch an instance and start working in minutes.
Best for:
Interactive development with SSH or Jupyter notebooks
Prototyping and experimentation on a single instance
Teams that want pre-configured virtual environments for PyTorch, JAX, or vLLM
DLAMI types:
Multi-Framework DLAMI — includes PyTorch, JAX, vLLM, and NxD libraries in separate virtual environments. Use this when you want to explore multiple frameworks or switch between training and inference workflows.
Single Framework DLAMI — optimized for one framework version. Use this for production deployments where you know exactly which framework you need.
Base DLAMI — includes only Neuron drivers, EFA, and tools. Use this as a foundation for containerized applications or custom builds where you install your own packages.
Get started: Neuron DLAMI User Guide
Deep Learning Containers (DLCs)#
Use when: You need portable, reproducible environments for orchestrated deployments. DLCs are Docker images pre-built with Neuron SDK and a specific framework, available in Amazon ECR.
Best for:
Production deployments on EKS, ECS, or AWS Batch
CI/CD pipelines that require consistent environments
Multi-node distributed training where each node runs the same container
vLLM inference serving in containerized environments
Available containers:
PyTorch Training, PyTorch Inference, PyTorch vLLM Inference, JAX Training
Get started: Neuron Deep Learning Containers | Quickstart: Configure and deploy a vLLM server using Neuron Deep Learning Container (DLC)
Custom Docker containers#
Use when: You need full control over the container environment — custom dependencies, specific package versions, or a CI/CD pipeline that builds images from scratch.
Best for:
Teams with existing Docker build pipelines
Workloads requiring packages not included in DLCs
Environments with strict security or compliance requirements
Get started: Getting started with Neuron DLC using Docker | Customize Neuron DLC
Choose your compute service#
Your second decision is where to run your workload. Each AWS compute service offers different trade-offs between control, automation, and operational overhead.
Amazon EC2 (direct instance access)#
Use when: You want direct access to Neuron hardware on a single instance or a small number of instances. EC2 gives you full control over the instance lifecycle.
Best for:
Development and prototyping
Single-node training and inference
Interactive debugging with SSH access
Running Jupyter notebooks
Typical workflow: Launch a DLAMI, SSH in, activate a virtual environment, run your code.
Get started: Amazon EC2
Amazon EKS (Kubernetes orchestration)#
Use when: You need Kubernetes-based orchestration for containerized Neuron workloads. EKS provides device plugins, topology-aware scheduling, health monitoring, and Dynamic Resource Allocation (DRA) for Neuron devices.
Best for:
Production inference services with auto-scaling
Multi-node distributed training with EFA networking
Teams already using Kubernetes for workload management
Workloads requiring topology-aware device allocation (DRA)
Multi-node inference on Trn2 UltraServers
Key capabilities:
Neuron device plugin — exposes Neuron hardware to the Kubernetes scheduler
Neuron Helm chart — installs all infrastructure components with a single command (Components Included)
Dynamic Resource Allocation (DRA) — attribute-based device selection and topology-aware allocation on K8s 1.34+ (AWS Neuron Dynamic Resource Allocation (DRA))
UltraServer support — schedule MPI jobs across Trn2 UltraServer nodes (How to schedule MPI jobs to run on Neuron UltraServer on EKS)
Node problem detector — automatic health monitoring and node replacement (Deploy Neuron Node Problem Detector and Recovery)
Get started: Amazon EKS | Kubernetes environment setup for Neuron
Amazon ECS (container task orchestration)#
Use when: You want container orchestration without Kubernetes. ECS provides task-based scheduling for Neuron containers with simpler operational overhead than EKS.
Best for:
Teams already using ECS for container workloads
Simpler container deployments that don’t need Kubernetes features
Workloads where task-based scheduling is sufficient
Get started: Amazon ECS
AWS Batch (batch job scheduling)#
Use when: You have training jobs that run on a schedule or in response to events, and you want AWS to manage compute scaling automatically.
Best for:
Periodic or scheduled training jobs
Workloads with variable compute demand
Teams that want automatic resource provisioning and cleanup
Get started: AWS Batch
AWS ParallelCluster (HPC with Slurm)#
Use when: You need an HPC cluster with Slurm for large-scale distributed training. ParallelCluster manages the cluster lifecycle including head nodes, compute fleets, and shared storage.
Best for:
Large-scale distributed training across many Trn1 nodes
Teams familiar with Slurm job scheduling
Workloads requiring shared filesystems (EFS, FSx)
Get started: AWS ParallelCluster
Amazon SageMaker (managed ML platform)#
Use when: You want a fully managed ML platform that handles infrastructure provisioning, training orchestration, and model deployment. SageMaker abstracts away the compute management.
Best for:
Teams that prefer managed services over self-managed infrastructure
End-to-end ML workflows (data preparation → training → deployment)
Fine-tuning foundation models with SageMaker JumpStart
Resilient training with SageMaker HyperPod (automatic checkpointing and recovery)
Get started: Amazon SageMaker
Common deployment patterns#
Single-instance development and prototyping#
Launch a Multi-Framework DLAMI on an Inf2 or Trn1 instance. Activate the virtual environment for your framework and iterate on your model. This is the fastest path from zero to running code.
vLLM inference serving#
Deploy a vLLM DLC on EC2 for single-instance serving, or on EKS for production with auto-scaling. The vLLM DLC includes the Neuron vLLM plugin with continuous batching, speculative decoding, and OpenAI-compatible APIs.
For production: Deploy on EKS with DRA for topology-aware device allocation
Multi-node distributed training#
Use DLCs on EKS or ParallelCluster for distributed training across multiple Trn1 or Trn2 nodes with EFA networking.
EKS path: Set up EKS → Deploy training
ParallelCluster path: Set up ParallelCluster
Batch path: Train on AWS Batch
Production Kubernetes inference with DRA#
For production inference on EKS with Trn2 instances, use Dynamic Resource Allocation (DRA) for topology-aware device scheduling. DRA replaces the need for custom scheduler extensions and enables attribute-based device selection.
Set up EKS with Helm chart
Configure DRA for topology-aware allocation
For UltraServer workloads: Schedule MPI jobs
Further reading#
Pre-configured environments — Compare DLAMIs, DLCs, and custom Docker environments
Neuron infrastructure components — Neuron Kubernetes plugins, monitoring, and scheduling
Neuron Containers FAQ — Common questions about Neuron container deployments
Third-party solutions — Partner integrations (Ray, Domino)