This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
AWS Neuron SDK 2.25.0 release notes#
Date of release: July 31, 2025
Release highlights#
Neuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.
Inference Optimizations (NxD Core and NxDI)#
Neuron 2.25.0 introduces performance optimizations and new capabilities including:
Context and Data Parallel support for improved batch scaling
Chunked Attention for improved long sequence processing
Automatic Aliasing (Beta) for fast tensor operations
Disaggregated Serving (Beta) improvements
Model Support (NxDI)#
Neuron 2.25.0 expands model support to include:
Qwen3 dense models (0.6B to 32B parameters)
Flux.1-dev model for text-to-image generation (Beta)
Monitoring and Observability#
neuron-ls
now displays CPU and NUMA node affinity informationneuron-ls
adds NeuronCore IDs display for each Neuron Deviceneuron-monitor
improves accuracy of device utilization metrics
Framework Updates#
JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5
vLLM support upgraded to version 0.9.x V0
Development Environment Updates#
Neuron SDK updated to version 2.25.0 in:
Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023
Multi-framework DLAMI with environments for both PyTorch and JAX
PyTorch 2.7 Single Framework DLAMI
JAX 0.6 Single Framework DLAMI
Container Support#
Neuron SDK updated to version 2.25.0 in:
PyTorch 2.7 Training and Inference DLCs
JAX 0.6 Training DLC
vLLM 0.9.1 Inference DLC
Neuron Device Plugin and Scheduler container images for Kubernetes integration
Component release notes#
Select a card below to review detailed release notes for each component of the Neuron SDK version 2.25.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.
Support announcements#
This section signals the official deprecation or end of support for specific features, tools, and APIs.
End-of-support announcements#
An “end-of-support (EoS)” announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!
In a future release, the Neuron Compiler default flag
--auto-cast=matmult
will change to--auto-cast=none
.This means the Neuron Compiler will no longer perform auto-casting and use the data types of the operators in the incoming HLO. If the current behavior is desired, users can explicitly pass the
--auto-cast=matmult
and--auto-cast-type=bf16
options to the compiler.Note: This change will not affect Neuron NxDI, NxDT, and TNx Frameworks as these are set to
--auto-cast=none
by default. However, Torch-Neuronx users may experience an impact and must adjust their settings if they rely on the previous auto-casting behavior.Starting from Neuron Release 2.24, the Hugging Face Transformers NeuronX library is deprecated and in maintenance mode.
transformers-neuronx
releases will now only address critical security issues. In Neuron Release 2.26, Neuron will end support for transformers-neuronx. Current users oftransformers-neuronx
are advised to migrate to NeuronX Distributed Inference.PyTorch version 2.6 will no longer be supported in a coming release. Current users of PyTorch 2.6 are advised to upgrade to PyTorch 2.7, which is supported in this release.
Support for Python 3.9 will end in a coming release. Currently, we support versions of Python up to 3.11. Current users of Python 3.9 are advised to upgrade to Python 3.11, which is supported in this release.
Ending support in 2.25.0#
Items listed here are officially no longer supported starting with Neuron 2.25.0.
The following tutorials are no longer supported and have been moved the to AWS Neuron SDK doc archive:
Neuron 2.25 is the last release supporting NxDT Megatron Models. Future Neuron releases will not include support for NxDT Megatron Models. Current users of the NxDT Megatron Models are advised to use the Hugging Face model instead by setting the
CONF_FILE
variable in thetrain.sh
file to the config model you want to use.With version 2.25.0, Neuron no longer supports vLLM version 0.7.2. Current users of vLLM 0.7.2 are advised to upgrade to vLLM 0.9.1, which is supported in this release.
Transformers for NeuronX is no longer supported. For more details, see the prior announcement.
Previous releases#
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2