Tutorial: Kubernetes environment setup for Neuron¶
Customers that use Kubernetes can conveniently integrate Inf1 instances into their workflows.
A device plugin is provided which advertises Inferentia devices as a
system hardware resource. It is deployed to a cluster as a daemon set
using the provided:
tutorial will go through deploying the daemon set and running an example
This example will walk through running inferences against BERT model through a K8s service running on Inferentia.
Step 1: Deploy the neuron device plugin:¶
A device plugin exposes Inferentia to kubernetes as a resource. The device plugin container is provided through an ECR repo, pointed at in attached device plugin file.
Run the following command:
kubectl apply -f https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-containers-tools/k8s-neuron-device-plugin.yml
Make sure that you see Neuron device plugin running successfully:
kubectl get ds neuron-device-plugin-daemonset --namespace kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE neuron-device-plugin-daemonset 1 1 1 1 1 <none> 17h
You can now require Inferentia devices in a k8s manifest as in the following example. The number of Inferentia devices can be adjusted using the aws.amazon.com/neuron resource. The runtime expects 128 2-MB pages per inferentia device, therefore, hugepages-2Mi has to be set to 256 * number of Inferentia devices.
resources: limits: hugepages-2Mi: 256Mi aws.amazon.com/neuron: 1 requests: memory: 1024Mi