This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
AWS Neuron SDK 2.26.0: NxD Core release notes#
Date of release: September 18, 2025
Version: 0.15.22259
Go back to the AWS Neuron 2.26.0 release notes home
NxD Core inference improvements#
Non-distributed inference in parallel layers#
Updated parallel layers to support non-distributed inference when parallel state isn’t initialized.
In non-parallel environments, RowParallelLinear and ColumnParallelLinear now function as nn.Linear
,
and ParallelEmbedding``now functions as ``nn.Embedding
. This change enables you to simplify model code that
works on device and on CPU by enabling you to use the parallel layer in both cases.
Other improvements#
Added a
compiler_flag_hook
argument to ModelBuilder, which you can use to override compiler flags for different submodels and buckets.
Bug fixes#
Here’s what we fixed in 2.26.0:
Inference#
Added additional instance types to the
hardware
enum. For example,inf2
now maps totrn1
.Other minor bug fixes and improvements.
Known issues#
Something doesn’t work. Check here to find out if we already knew about it. We hope to fix these soon!
Inference#
At high batch size (>=32), we have observed performance degradation with
shard-on-load
for some models such as Llama3.1-8B. Our current recommendation is to disable this feature by enablingsave_sharded_checkpoint
inNeuronConfig
when you trace and compile the model.spmd_mode = True
does not work when provided to theparallel_model_trace
API.parallel_model_trace
will be deprecated in the next Neuron SDK release.
Previous release notes#
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2