This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
AWS Neuron Direct Resource Allocation (DRA) on Kubernetes: Support files#
This directory contains scripts, manifests, and templates supporting AWS Neuron Direct Resource Allocation (DRA) on Kubernetes. You can view and download these files from the links below.
Download the scripts and YAML files as a TAR/GZIP archive
Preserve the directory structure when you extract the archive. The driver installation script uses this relative folder structure to find the corresponding YAML files.
Directory structure:
containers/files/
├── manifests/
│ ├── clusterrole.yaml
│ ├── clusterrolebinding.yaml
│ ├── daemonset.yaml
│ ├── deviceclass.yaml
│ ├── namespace.yaml
│ └── serviceaccount.yaml
└── examples/
├── scripts/
│ └── install-dra-driver.sh
└── specs/
├── 1x4-connected-devices.yaml
├── 2-node-inference-us.yaml
├── 4-node-inference-us.yaml
├── all-devices.yaml
├── lnc-setting-trn2.yaml
└── specific-driver-version.yaml
Installation Scripts#
These scripts automate the deployment and configuration of the Neuron DRA driver on your Kubernetes cluster.
File Name |
Description |
Download |
|---|---|---|
install-dra-driver.sh |
Automated deployment script for the Neuron DRA driver that applies all necessary manifests and waits for successful deployment. |
Kubernetes Manifests#
Core Kubernetes resources required to deploy and configure the Neuron DRA driver with proper RBAC permissions.
File Name |
Description |
Download |
|---|---|---|
clusterrole.yaml |
ClusterRole definition with permissions required for the Neuron DRA driver to manage device resources. |
|
clusterrolebinding.yaml |
ClusterRoleBinding that associates the service account with the required cluster role permissions. |
|
daemonset.yaml |
DaemonSet configuration for deploying the Neuron DRA driver on all compatible Trainium nodes. |
|
deviceclass.yaml |
DeviceClass resource that defines the Neuron device class for DRA resource allocation. |
|
namespace.yaml |
Namespace definition for isolating Neuron DRA driver resources within the cluster. |
|
serviceaccount.yaml |
ServiceAccount configuration for the Neuron DRA driver with appropriate security context. |
Resource Claim Specifications#
Example resource claim templates and pod specifications demonstrating different Neuron device allocation patterns for various workload requirements.
File Name |
Description |
Download |
|---|---|---|
1x4-connected-devices.yaml |
Resource claim template for allocating 4 connected Neuron devices with topology constraints for optimal performance. |
|
2-node-inference-us.yaml |
Multi-node inference configuration for distributed workloads across 2 Trainium nodes. |
|
4-node-inference-us.yaml |
Large-scale inference setup for distributed workloads spanning 4 Trainium nodes. |
|
all-devices.yaml |
Resource claim template that allocates all available Neuron devices on a trn2.48xlarge instance. |
|
lnc-setting-trn2.yaml |
Logical NeuronCore configuration template optimized for Trainium2 instances. |
|
specific-driver-version.yaml |
Example configuration for requesting specific Neuron driver versions in resource claims. |
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3