Neuron infrastructure components#

Neuron provides infrastructure components for managing Neuron hardware in containerized and Kubernetes environments. These components handle device discovery, scheduling, health monitoring, and resource allocation.

Overview#

Neuron plugins overview

Overview of all Neuron infrastructure components: device plugin, scheduler extension, node problem detector, monitor, and DRA driver.

Installation#

Neuron Helm chart

Install all Neuron infrastructure components with a single Helm command. The recommended installation method for EKS.

Scheduling and device management#

Scheduler extension

Topology-aware scheduling for optimal Neuron device allocation in Kubernetes.

Scheduler flow diagram

Visual diagram of how the Neuron scheduler extension integrates with Kubernetes components.

Monitoring and health#

Neuron monitor

Collect and expose Neuron device metrics with Prometheus integration for observability and alerting.

Node problem detector and recovery

Detect hardware failures and trigger automatic node replacement for Neuron devices.

NPD permissions (IRSA)

Configure IAM roles for service accounts to grant the node problem detector necessary permissions.