This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
AWS Neuron SDK 2.25.0: NxD Inference release notes#
Date of release: July 31, 2025
Version: 0.5.9230
Go back to the AWS Neuron 2.25.0 release notes home
Improvements#
Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!
Qwen3 (dense) model support#
Add support for Qwen3 dense models, which are tested on Trn1. Compatible models include:
For more information, see NxD Inference - Production Ready Models.
Other improvements#
Added simplified functions that you can use to validate the accuracy of logits returned by a model. These new functions include
check_accuracy_logits_v2
andgenerated_expected_logits
, which provide more flexibility thancheck_accuracy_logits
. For more information, see Evaluating Models on Neuron.Added
scratchpad_page_size
attribute to NeuronConfig. You can specify this attribute to configure the scratchpad page size used during compilation and at runtime. The scratchpad is a shared memory buffer used for internal model variables and other data. For more information, see NeuronConfig.
Breaking changes#
Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.
Removed support for Meta checkpoint compatibility in Llama3.2 Multimodal modeling code. You can continue to use Hugging Face checkpoints. Hugging Face provides a conversion script that you can run to convert a Meta checkpoint to a Hugging Face checkpoint.
Bug fixes#
We’re always fixing bugs. It’s developer’s life! Here’s what we fixed in 2.25.0:
Fixed accuracy issues when using Automatic Prefix Caching (APC) with EAGLE speculation.
Fixed continuous batching for Llama3.2 Multimodal where the input batch size is less than the compiled batch size.
Added support for continuous batching when running Neuron modeling code on CPU.
Set a manual seed in
benchmark_sampling
to improve the stability of data-dependent benchmarks like speculation.Other minor fixes and improvements.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2