This document is relevant for: Inf1, Inf2, Trn1, Trn2

EC2 Instance#

Introduction#

Use of Neuron in containers on EC2 can be simple to achieve by following these steps

  • tutorial-docker-env-setup-for-neuron

  • More details on EC2 setup can be found at

DLC Images#

  • The location for DLC images for Neuron can be obtained from here

  • To get the list of images for neuron, the following commands can be used.

    aws ecr list-images --registry-id 763104351884 --repository-name tensorflow-inference-neuron

    aws ecr list-images --registry-id 763104351884 --repository-name pytorch-inference-neuron

Setup recommendations#

  • The EC2 Inf1 instance needs to have the aws-neuron-runtime-base and aws-neruon-dkms package installed.

  • The DLC inference container runs the framework server (like tensorflow-model-server or TorchServe) and also the neuron runtime that interacts with the neuron driver running in the host.

  • For more details on setting up the container, check the tensorflow or pytorch. Make sure the appropriate framework container image is used.

Debug Hints#

  • Use the docker log command to get the neuron rtd logs in the container.

    docker logs <container-name>

  • Look for errors like the following
    • If we see nrtd[8]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0, it either means that some other container is using that device or the host is running the neuron-rtd process.

    • Check to see that host is not running neuron-rtd

      sudo systemctl status neuron-rtd

This document is relevant for: Inf1, Inf2, Trn1, Trn2