Getting started with Neuron DLC using Docker

Launch Trn1 Instance

Please follow the instructions at launch an Amazon EC2 Instance to Launch an instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type.
To get more information about instances sizes and pricing see: Trn1 web page, Inf2 web page, Inf1 web page
Select your Amazon Machine Image (AMI) of choice, please note that Neuron supports Amazon Linux 2 AMI(HVM) - Kernel 5.10.
When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.
After launching the instance, follow the instructions in Connect to your instance to connect to the instance

Note

If you are facing a connectivity issue during the model loading process on a Trn1 instance with Ubuntu, that could probably be because of Ubuntu limitations with multiple interfaces. To solve this problem, please follow the steps mentioned here.

Users are highly encouraged to use DLAMI to launch the instances, since DLAMIs come with the required fix.

Install Drivers

# Configure Linux for Neuron repository updates

sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
[neuron]
name=Neuron YUM Repository
baseurl=https://yum.repos.neuron.amazonaws.com
enabled=1
metadata_expire=0
EOF
sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB

# Update OS packages
sudo dnf update -y

# Install OS headers
sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y

# Remove preinstalled packages and Install Neuron Driver and Runtime
sudo dnf remove aws-neuron-dkms -y
sudo dnf remove aws-neuronx-dkms -y
sudo dnf install aws-neuronx-dkms-2.*  -y

# Install EFA Driver(only required for multi-instance training)
curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz
wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key
cat aws-efa-installer.key | gpg --fingerprint
wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig
tar -xvf aws-efa-installer-latest.tar.gz
cd aws-efa-installer && sudo bash efa_installer.sh --yes
cd
sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer

Install Docker

sudo dnf install -y docker.io
sudo usermod -aG docker $USER

Logout and log back in to refresh membership.

Verify Docker

docker run hello-world

Expected result:

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/get-started/

Verify Neuron Component

Once the environment is setup, a container can be started with –device=/dev/neuron# to specify desired set of Inferentia/Trainium devices to be exposed to the container. To find out the available neuron devices on your instance, use the command ls /dev/neuron*.

When running neuron-ls inside a container, you will only see the set of exposed Trainiums. For example:

docker run --device=/dev/neuron0 neuron-test neuron-ls

Would produce the following output in trn1.32xlarge:

+--------+--------+--------+---------+
| NEURON | NEURON | NEURON |   PCI   |
| DEVICE | CORES  | MEMORY |   BDF   |
+--------+--------+--------+---------+
| 0      | 2      | 32 GB  | 10:1c.0 |
+--------+--------+--------+---------+

Build and Run Docker Image

Tutorial How to Build and Run a Neuron Container

Run Tutorial

Run Training in PyTorch Neuron Container

Getting started with Neuron DLC using Docker

Getting started with Neuron DLC using Docker#