This document is relevant for: Inf1, Inf2, Trn1, Trn2

AWS Neuron SDK 2.25.0 release notes#

Date of release: July 31, 2025

Release highlights#

Neuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.

Inference Optimizations (NxD Core and NxDI)#

Neuron 2.25.0 introduces performance optimizations and new capabilities including:

  • On-device Forward Pipeline, reducing latency by up to 43% in models like Pixtral

  • Context and Data Parallel support for improved batch scaling

  • Chunked Attention for efficient long sequence processing

  • 128k context length support for Llama 70B models

  • Automatic Aliasing (Beta) for faster tensor operations

  • Disaggregated Serving (Beta) showing 20% improvement in ITL/TTST

Model Support (NxDI)#

Neuron 2.25.0 expands model support to include:

  • Qwen3 dense models (0.6B to 32B parameters)

  • Flux.1-dev model for text-to-image generation (Beta)

  • Pixtral-Large-Instruct-2411 for image-to-text generation (Beta)

Profiling Updates#

Enhancements to profiling capabilities include:

  • Addition of timestamp sync points to align device execution with CPU events

  • Expanded JSON output providing the same detailed data set used by the Neuron Profiler UI

  • New total active time metric showing accelerator utilization as percentage of total runtime

  • Fixed DMA active time calculation for more accurate measurements

Monitoring and Observability#

  • neuron-ls now displays CPU and NUMA node affinity information

  • neuron-ls adds NeuronCore IDs display for each Neuron Device

  • neuron-monitor improves accuracy of device utilization metrics

Framework Updates#

  • JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5

  • vLLM support upgraded to version 0.9.x V0

Development Environment Updates#

Neuron SDK updated to version 2.25.0 in:

  • Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023

  • Multi-framework DLAMI with environments for both PyTorch and JAX

  • PyTorch 2.7 Single Framework DLAMI

  • JAX 0.6 Single Framework DLAMI

Container Support#

Neuron SDK updated to version 2.25.0 in:

  • PyTorch 2.7 Training and Inference DLCs

  • JAX 0.6 Training DLC

  • vLLM 0.9.1 Inference DLC

  • Neuron Device Plugin and Scheduler container images for Kubernetes integration

Component release notes#

Select a card below to review detailed release notes for each component of the Neuron SDK version 2.25.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.

PyTorch framework 2.25.0 release notes

Neuron features and solutions that support the PyTorch ML framework.

JAX framework 2.25.0 release notes

Neuron features and solutions that support the JAX ML framework.

NxD Training 2.25.0 release notes

Neuron features and tools for LLM and agent ML model training.

NxD Inference 2.25.0 release notes

Neuron features and tools for LLM and agent ML model inference.

NxD Core 2.25.0 release notes

Common features and tools for Neuron-based training and inference.

Neuron Compiler 2.25.0 release notes

The Neuron compiler for AWS Trainium and Inferentia, and its libraries and tools.

Neuron Kernel Interface (NKI) 2.25.0 release notes

Neuron’s Python-based programming interface for developing and optimizing Neuron kernels.

Neuron Runtime 2.25.0 release notes

The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.

Neuron Developer Tools 2.25.0 release notes

Tools that support end-to-end development for AWS Neuron.

Neuron Deep Learning AWS Machine Images (DLAMIs) 2.25.0 release notes

AWS-specific machine images for building and deploying Neuron-based ML solutions.

Neuron Deep Learning Containers (DLCs) 2.25.0 release notes

AWS-specific container definitions for building and deploying Neuron-based ML solutions.

Documentation and samples 2.25.0 release notes

Changes to the Neuron docs and code samples.

Neuron 2.25.0 release artifacts

The libraries and packages updated in this release.

Support announcements#

This section signals the official deprecation or end of support for specific features, tools, and APIs.

End-of-support announcements#

An “end-of-support (EoS)” announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!

  • In a future release, the Neuron Compiler default flag --auto-cast=matmult will change to --auto-cast=none.

    This means the Neuron Compiler will no longer perform auto-casting and use the data types of the operators in the incoming HLO. If the current behavior is desired, users can explicitly pass the --auto-cast=matmult and --auto-cast-type=bf16 options to the compiler.

    Note: This change will not affect Neuron NxDI, NxDT, and TNx Frameworks as these are set to --auto-cast=none by default. However, Torch-Neuronx users may experience an impact and must adjust their settings if they rely on the previous auto-casting behavior.

  • Starting from Neuron Release 2.24, the Hugging Face Transformers NeuronX library is deprecated and in maintenance mode. transformers-neuronx releases will now only address critical security issues. In Neuron Release 2.26, Neuron will end support for transformers-neuronx. Current users of transformers-neuronx are advised to migrate to NeuronX Distributed Inference.

  • PyTorch version 2.6 will no longer be supported in a coming release. Current users of PyTorch 2.6 are advised to upgrade to PyTorch 2.7, which is supported in this release.

  • Support for Python 3.9 will end in a coming release. Currently, we support versions of Python up to 3.11. Current users of Python 3.9 are advised to upgrade to Python 3.11, which is supported in this release.

Ending support in 2.25.0#

Items listed here are officially no longer supported starting with Neuron 2.25.0.

  • The following tutorials are no longer supported and have been moved the to AWS Neuron SDK doc archive:

    • Fine-tuning Hugging Face T5 for text summarization

    • Running SSD300 with AWS Neuron

  • Neuron 2.25 is the last release supporting NxDT Megatron Models. Future Neuron releases will not include support for NxDT Megatron Models. Current users of the NxDT Megatron Models are advised to use the Hugging Face model instead by setting the CONF_FILE variable in the train.sh file to the config model you want to use.

  • With version 2.25.0, Neuron no longer supports vLLM version 0.7.2. Current users of vLLM 0.7.2 are advised to upgrade to vLLM 0.9.1, which is supported in this release.

  • Transformers for NeuronX is no longer supported. For more details, see the prior announcement.

Previous releases#

This document is relevant for: Inf1, Inf2, Trn1, Trn2