This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
Update PyTorch in a Deep Learning Container#
Update your DLC-based PyTorch Neuron deployment to the latest release.
Update the container image#
DLC images are versioned and tagged with the Neuron SDK version. To update, pull the latest image tag from ECR:
# Training
docker pull public.ecr.aws/neuron/pytorch-training-neuronx:<new_image_tag>
# Inference
docker pull public.ecr.aws/neuron/pytorch-inference-neuronx:<new_image_tag>
# vLLM Inference
docker pull public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:<new_image_tag>
Replace <new_image_tag> with the tag for the desired SDK version (e.g.,
2.9.0-neuronx-py312-sdk2.29.0-ubuntu24.04).
Check available tags at the ECR Public Gallery:
For the full list of available images and tags, see Neuron Deep Learning Containers.
Update Neuron driver on the host#
The Neuron driver runs on the host, not inside the container. Update it separately when moving to a new Neuron SDK release.
sudo apt-get update
sudo apt-get install -y aws-neuronx-dkms
sudo apt-get update
sudo apt-get install -y aws-neuronx-dkms
sudo dnf install -y aws-neuronx-dkms
Verify the update#
Launch the new container and verify:
docker run -it \
--device=/dev/neuron0 \
--cap-add SYS_ADMIN \
--cap-add IPC_LOCK \
public.ecr.aws/neuron/pytorch-training-neuronx:<new_image_tag> \
bash
Inside the container:
python3 -c "import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')"
neuron-ls
⚠️ Troubleshooting: Version mismatch between host driver and container
If you see runtime errors after updating the container image but not the host driver:
Check the host driver version:
modinfo neuronon the hostUpdate the host driver to match the SDK version in the container
Reboot if the driver update requires it:
sudo reboot
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3