This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

AWS Neuron Direct Resource Allocation (DRA) on Kubernetes: Support files#

This directory contains scripts, manifests, and templates supporting AWS Neuron Direct Resource Allocation (DRA) on Kubernetes. You can view and download these files from the links below.

Download the scripts and YAML files as a TAR/GZIP archive

Preserve the directory structure when you extract the archive. The driver installation script uses this relative folder structure to find the corresponding YAML files.

Directory structure:

containers/files/
           ├── manifests/
           │   ├── clusterrole.yaml
           │   ├── clusterrolebinding.yaml
           │   ├── daemonset.yaml
           │   ├── deviceclass.yaml
           │   ├── namespace.yaml
           │   └── serviceaccount.yaml
           └── examples/
               ├── scripts/
               │   └── install-dra-driver.sh
               └── specs/
                   ├── 1x4-connected-devices.yaml
                   ├── 2-node-inference-us.yaml
                   ├── 4-node-inference-us.yaml
                   ├── all-devices.yaml
                   ├── lnc-setting-trn2.yaml
                   └── specific-driver-version.yaml

Installation Scripts#

These scripts automate the deployment and configuration of the Neuron DRA driver on your Kubernetes cluster.

File Name

Description

Download

install-dra-driver.sh

Automated deployment script for the Neuron DRA driver that applies all necessary manifests and waits for successful deployment.

Download

Kubernetes Manifests#

Core Kubernetes resources required to deploy and configure the Neuron DRA driver with proper RBAC permissions.

File Name

Description

Download

clusterrole.yaml

ClusterRole definition with permissions required for the Neuron DRA driver to manage device resources.

Download

clusterrolebinding.yaml

ClusterRoleBinding that associates the service account with the required cluster role permissions.

Download

daemonset.yaml

DaemonSet configuration for deploying the Neuron DRA driver on all compatible Trainium nodes.

Download

deviceclass.yaml

DeviceClass resource that defines the Neuron device class for DRA resource allocation.

Download

namespace.yaml

Namespace definition for isolating Neuron DRA driver resources within the cluster.

Download

serviceaccount.yaml

ServiceAccount configuration for the Neuron DRA driver with appropriate security context.

Download

Resource Claim Specifications#

Example resource claim templates and pod specifications demonstrating different Neuron device allocation patterns for various workload requirements.

File Name

Description

Download

1x4-connected-devices.yaml

Resource claim template for allocating 4 connected Neuron devices with topology constraints for optimal performance.

Download

2-node-inference-us.yaml

Multi-node inference configuration for distributed workloads across 2 Trainium nodes.

Download

4-node-inference-us.yaml

Large-scale inference setup for distributed workloads spanning 4 Trainium nodes.

Download

all-devices.yaml

Resource claim template that allocates all available Neuron devices on a trn2.48xlarge instance.

Download

lnc-setting-trn2.yaml

Logical NeuronCore configuration template optimized for Trainium2 instances.

Download

specific-driver-version.yaml

Example configuration for requesting specific Neuron driver versions in resource claims.

Download

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3