This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Update PyTorch in a Deep Learning Container#

Update your DLC-based PyTorch Neuron deployment to the latest release.

Update the container image#

DLC images are versioned and tagged with the Neuron SDK version. To update, pull the latest image tag from ECR:

# Training
docker pull public.ecr.aws/neuron/pytorch-training-neuronx:<new_image_tag>

# Inference
docker pull public.ecr.aws/neuron/pytorch-inference-neuronx:<new_image_tag>

# vLLM Inference
docker pull public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:<new_image_tag>

Replace <new_image_tag> with the tag for the desired SDK version (e.g., 2.9.0-neuronx-py312-sdk2.29.0-ubuntu24.04).

Check available tags at the ECR Public Gallery:

For the full list of available images and tags, see Neuron Deep Learning Containers.

Update Neuron driver on the host#

The Neuron driver runs on the host, not inside the container. Update it separately when moving to a new Neuron SDK release.

sudo apt-get update
sudo apt-get install -y aws-neuronx-dkms
sudo apt-get update
sudo apt-get install -y aws-neuronx-dkms
sudo dnf install -y aws-neuronx-dkms

Verify the update#

Launch the new container and verify:

docker run -it \
  --device=/dev/neuron0 \
  --cap-add SYS_ADMIN \
  --cap-add IPC_LOCK \
  public.ecr.aws/neuron/pytorch-training-neuronx:<new_image_tag> \
  bash

Inside the container:

python3 -c "import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')"
neuron-ls
⚠️ Troubleshooting: Version mismatch between host driver and container

If you see runtime errors after updating the container image but not the host driver:

  1. Check the host driver version: modinfo neuron on the host

  2. Update the host driver to match the SDK version in the container

  3. Reboot if the driver update requires it: sudo reboot

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3