This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
AWS Neuron SDK 2.25.0 release notes#
Date of release: July 31, 2025
Release highlights#
Neuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.
Inference Optimizations (NxD Core and NxDI)#
Neuron 2.25.0 introduces performance optimizations and new capabilities including:
On-device Forward Pipeline, reducing latency by up to 43% in models like Pixtral
Context and Data Parallel support for improved batch scaling
Chunked Attention for efficient long sequence processing
128k context length support for Llama 70B models
Automatic Aliasing (Beta) for faster tensor operations
Disaggregated Serving (Beta) showing 20% improvement in ITL/TTST
Model Support (NxDI)#
Neuron 2.25.0 expands model support to include:
Qwen3 dense models (0.6B to 32B parameters)
Flux.1-dev model for text-to-image generation (Beta)
Pixtral-Large-Instruct-2411 for image-to-text generation (Beta)
Profiling Updates#
Enhancements to profiling capabilities include:
Addition of timestamp sync points to align device execution with CPU events
Expanded JSON output providing the same detailed data set used by the Neuron Profiler UI
New total active time metric showing accelerator utilization as percentage of total runtime
Fixed DMA active time calculation for more accurate measurements
Monitoring and Observability#
neuron-ls
now displays CPU and NUMA node affinity informationneuron-ls
adds NeuronCore IDs display for each Neuron Deviceneuron-monitor
improves accuracy of device utilization metrics
Framework Updates#
JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5
vLLM support upgraded to version 0.9.x V0
Development Environment Updates#
Neuron SDK updated to version 2.25.0 in:
Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023
Multi-framework DLAMI with environments for both PyTorch and JAX
PyTorch 2.7 Single Framework DLAMI
JAX 0.6 Single Framework DLAMI
Container Support#
Neuron SDK updated to version 2.25.0 in:
PyTorch 2.7 Training and Inference DLCs
JAX 0.6 Training DLC
vLLM 0.9.1 Inference DLC
Neuron Device Plugin and Scheduler container images for Kubernetes integration
Component release notes#
Select a card below to review detailed release notes for each component of the Neuron SDK version 2.25.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.
Support announcements#
This section signals the official deprecation or end of support for specific features, tools, and APIs.
End-of-support announcements#
An “end-of-support (EoS)” announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!
In a future release, the Neuron Compiler default flag
--auto-cast=matmult
will change to--auto-cast=none
.This means the Neuron Compiler will no longer perform auto-casting and use the data types of the operators in the incoming HLO. If the current behavior is desired, users can explicitly pass the
--auto-cast=matmult
and--auto-cast-type=bf16
options to the compiler.Note: This change will not affect Neuron NxDI, NxDT, and TNx Frameworks as these are set to
--auto-cast=none
by default. However, Torch-Neuronx users may experience an impact and must adjust their settings if they rely on the previous auto-casting behavior.Starting from Neuron Release 2.24, the Hugging Face Transformers NeuronX library is deprecated and in maintenance mode.
transformers-neuronx
releases will now only address critical security issues. In Neuron Release 2.26, Neuron will end support for transformers-neuronx. Current users oftransformers-neuronx
are advised to migrate to NeuronX Distributed Inference.PyTorch version 2.6 will no longer be supported in a coming release. Current users of PyTorch 2.6 are advised to upgrade to PyTorch 2.7, which is supported in this release.
Support for Python 3.9 will end in a coming release. Currently, we support versions of Python up to 3.11. Current users of Python 3.9 are advised to upgrade to Python 3.11, which is supported in this release.
Ending support in 2.25.0#
Items listed here are officially no longer supported starting with Neuron 2.25.0.
The following tutorials are no longer supported and have been moved the to AWS Neuron SDK doc archive:
Fine-tuning Hugging Face T5 for text summarization
Running SSD300 with AWS Neuron
Neuron 2.25 is the last release supporting NxDT Megatron Models. Future Neuron releases will not include support for NxDT Megatron Models. Current users of the NxDT Megatron Models are advised to use the Hugging Face model instead by setting the
CONF_FILE
variable in thetrain.sh
file to the config model you want to use.With version 2.25.0, Neuron no longer supports vLLM version 0.7.2. Current users of vLLM 0.7.2 are advised to upgrade to vLLM 0.9.1, which is supported in this release.
Transformers for NeuronX is no longer supported. For more details, see the prior announcement.
Previous releases#
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2