This document is relevant for: Inf1, Inf2, Trn1, Trn2

AWS Neuron SDK 2.26.0 release notes#

Date of release: September 18, 2025

What’s new?#

AWS Neuron SDK 2.26.0 adds support for PyTorch 2.8, JAX 0.6.2, along with support for Python 3.11, and introduces inference improvements on Trainium2 (Trn2). This release includes expanded model support, enhanced parallelism features, new Neuron Kernel Interface (NKI) APIs, and improved development tools for optimization and profiling.

Inference Updates#

NxD Inference - Model support expands with beta releases of Llama 4 Scout and Maverick variants on Trn2. The FLUX.1-dev image generation models are now available in beta on Trn2 instances.

Expert parallelism is now supported in beta, enabling MoE expert distribution across multiple NeuronCores. This release introduces on-device forward pipeline execution in beta and adds sequence parallelism in MoE routers for model deployment flexibility.

Neural Kernel Interface (NKI)#

New APIs enable additional optimization capabilities:

  • gelu_apprx_sigmoid: GELU activation with sigmoid approximation

  • select_reduce: Selective element copying with maximum reduction

  • sequence_bounds: Sequence bounds computation

API enhancements include:

  • tile_size: Added total_available_sbuf_size field

  • dma_transpose: Added axes parameter for 4D transpose.

  • activation: Added gelu_apprx_sigmoid operation

Developer Tools#

Neuron Profiler improvements include the ability to select multiple semaphores at once to correlate pending activity with semaphore waits and increments. Additionally, system profile grouping now uses a global NeuronCore ID instead of a process local ID for visibility across distributed workloads. The Profiler also adds warnings for dropped events due to limited buffer space.

The nccom-test utility adds State Buffer support on Trn2 for collective operations, including all-reduce, all-gather, and reduce-scatter operations. Error reporting provides messages for invalid all-to-all collective sizes to help developers identify and resolve issues.

Deep Learning AMI and Containers#

The Deep Learning AMI now supports PyTorch 2.8 on Amazon Linux 2023 and Ubuntu 22.04. Container updates include PyTorch 2.8.0 and Python 3.11 across all DLCs. The transformers-neuronx environment and package have been removed from PyTorch inference DLAMI/DLC.

Component release notes#

Select a card below to review detailed release notes for updated components of the Neuron SDK version 2.26.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.

PyTorch support 2.26.0 release notes

Neuron features and solutions that support the PyTorch ML framework.

JAX support 2.26.0 release notes

Neuron features and solutions that support the JAX ML framework.

NxD Inference 2.26.0 release notes

Neuron features and tools for LLM and agent ML model inference.

NxD Core 2.26.0 release notes

Common features and tools for Neuron-based training and inference.

Neuron Kernel Interface (NKI) 2.26.0 release notes

Neuron’s Python-based programming interface for developing and optimizing Neuron kernels.

Neuron Runtime 2.26.0 release notes

The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.

Neuron Developer Tools 2.26.0 release notes

Tools that support end-to-end development for AWS Neuron.

Neuron Deep Learning AWS Machine Images (DLAMIs) 2.26.0 release notes

AWS-specific machine images for building and deploying Neuron-based ML solutions.

Neuron Deep Learning Containers (DLCs) 2.26.0 release notes

AWS-specific container definitions for building and deploying Neuron-based ML solutions.

Neuron 2.26.0 release artifacts

The libraries and packages updated in this release.

Support announcements#

This section signals the official end-of-support or end of support for specific features, tools, and APIs.

End-of-support announcements#

An “end-of-support (EoS)” announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!

  • The Neuron Compiler default for the --auto-cast option will change from --auto-cast=matmult to --auto-cast=none in a future release.

  • The Beta versions of the PyTorch NeuronCore Placement APIs are no longer supported with this release.

  • Neuron version 2.26.0 is the last release supporting parallel_model_trace. This NxD Inference function will be deprecated in the next version of the Neuron SDK in favor of the ModelBuilder.trace() method, which provides a more robust and flexible approach for tracing and compiling models for Neuron devices, enabling more advanced features such as weight layout optimization support, as well as other quality-of-life and stability improvements for SPMD tracing.

    For customers directly invoking parallel_model_trace, they can now use ModelBuilderV2 APIs. For more details on these APIS, see ModelBuilderV2 API Reference. For customers that are directly using models in NxDI, there is no impact since NxDI models are already built on MBv1 which has no issues.

Ending support in 2.26.0#

“ End-of-support” means that AWS Neuron no longer supports the feature, tool, or API indicated in the note as of this release.

  • End-of-support for the Transformers NeuronX library starts with the 2.26.0 release of the AWS Neuron SDK. As a result, the PyTorch inference Deep Learning Container (DLC) will no longer include the transformers-neuronx package and Neuron no longer provides the transformers_neuronx virtual environment in both single and multi-framework DLAMIs. For more details, see Announcing end of support for Transformers NeuronX library starting in Neuron 2.26 release.

  • Starting with Neuron Release 2.26, Neuron driver versions above 2.21 will only support non-Inf1 instances (such as Trn1, Inf2, or other instance types). For Inf1 instance users, Neuron driver versions less than 2.21 will remain supported with regular security patches.

  • The Beta versions of the PyTorch NeuronCore Placement APIs are no longer supported with this release.

Known issues: Samples#

  • When running the UNet training sample with the Neuron compiler, you may encounter this error: Estimated peak HBM usage exceeds 16GB.

    • To work around this error, include the function conv_wrap in your model. (You can find a usable example of this function in the UNet sample model code.) Then, define a custom backward pass for your model following the instructions and example in the Pytorch documentation. The UNet sample also illustrates how this is done for the convolution layers in UNet.

Previous releases#

This document is relevant for: Inf1, Inf2, Trn1, Trn2