This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
EC2 Instance#
Introduction#
Use of Neuron in containers on EC2 can be simple to achieve by following these steps
tutorial-docker-env-setup-for-neuron
More details on EC2 setup can be found at
DLC Images#
The location for DLC images for Neuron can be obtained from here
To get the list of images for neuron, the following commands can be used.
aws ecr list-images --registry-id 763104351884 --repository-name tensorflow-inference-neuron
aws ecr list-images --registry-id 763104351884 --repository-name pytorch-inference-neuron
Setup recommendations#
The EC2 Inf1 instance needs to have the aws-neuron-runtime-base and aws-neruon-dkms package installed.
The DLC inference container runs the framework server (like tensorflow-model-server or TorchServe) and also the neuron runtime that interacts with the neuron driver running in the host.
For more details on setting up the container, check the tensorflow or pytorch. Make sure the appropriate framework container image is used.
Debug Hints#
Use the docker log command to get the neuron rtd logs in the container.
docker logs <container-name>
- Look for errors like the following
If we see nrtd[8]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0, it either means that some other container is using that device or the host is running the neuron-rtd process.
Check to see that host is not running neuron-rtd
sudo systemctl status neuron-rtd
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2