This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Neuron Plugins for Containerized Environments#

This section summarizes various neuron infrastructure artifacts for containerized environments.

  • Neuron Node Problem Detector - This plugin enhances resiliency by detecting and remediating errors. For detailed instructions on running this plugin in EKS environment, please refer to EKS Setup For Neuron To leverage this plugin on ECS, please refer to Neuron Problem Detector And Recovery

  • Neuron Device Plugin - The Neuron device plugin manages Neuron hardware resources in a Kubernetes environment. It integrates with the Kubernetes device plugin framework to advertise and manage Neuron resources, making them available for use by Pods. For more information on using Neuron with Kubernetes, please refer to EKS Setup For Neuron

  • Neuron Scheduler Extension - Neuron scheduler extension is a Kubernetes artifact which helps with optimal allocation of neuron cores. Installating scheduler extension is optional if a workload pod consumes all neuron resources on a node. For more information on using Neuron with Kubernetes, please refer to EKS Setup For Neuron

This document is relevant for: Inf1, Inf2, Trn1, Trn1n