This document is relevant for: Inf1, Inf2, Trn1, Trn2

Neuron monitor is primary observability tool for neuron devices. For details of neuron monitor, please refer to the neuron monitor guide. This tutorial describes deploying neuron monitor as a daemonset on the kubernetes cluster.

Deploy Neuron Monitor Daemonset#

Download the neuron monitor yaml file. k8s-neuron-monitor-daemonset.yml
Apply the Neuron monitor yaml to create a daemonset on the cluster with the following command
kubectl apply -f k8s-neuron-monitor.yml

Verify that neuron monitor daemonset is running

kubectl get ds neuron-monitor --namespace neuron-monitor

Expected result (with 2 nodes in cluster):

NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
neuron-monitor                     2         2         2       2            2           <none>          27h

Get the neuron-monitor pod names

kubectl get pods

Expected result

NAME                   READY   STATUS    RESTARTS   AGE
neuron-monitor-slsxf   1/1     Running   0          17m
neuron-monitor-wc4f5   1/1     Running   0          17m

Verify the prometheus endpoint is available

kubectl exec neuron-monitor-wc4f5 -- wget -q --output-document - http://127.0.0.1:8000

Expected result

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 362.0
python_gc_objects_collected_total{generation="1"} 0.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter

This document is relevant for: Inf1, Inf2, Trn1, Trn2