This document is relevant for: Inf1, Inf2, Trn1, Trn2

What’s New#

Neuron 2.24.0 (06/24/2025)#

Neuron version 2.24 introduces new inference capabilities including prefix caching, disaggregated inference (Beta), and context parallelization support (Beta). This release also includes NKI language enhancements and enhanced profiling visualizations for improved debugging and performance analysis. Neuron 2.24 adds support for PyTorch 2.7 and JAX 0.6, updates existing DLAMIs and DLCs, and introduces a new vLLM inference container.

Inference #

NxD Inference (NxDI) includes the following enhancements:

Prefix caching: Improves Time To First Token (TTFT) by up to 3x when processing common shared prompts across requests.
Disaggregated inference (Beta): Uses 1P1D (1 Prefill, 1 Decode) architecture to reduce prefill-decode interference and improve goodput.
Context parallelism (Beta): Improves TTFT for longer sequence lengths by processing context encoding in parallel across multiple NeuronCores.
Model support: Added beta support for Qwen 2.5 text models.
NxD Inference Library: Upgraded to support PyTorch 2.7 and Transformers 4.48.

Hugging Face Optimum Neuron 0.2.0 now supports PyTorch-based NxD Core backend for LLM inference, simplifying the implementation of new PyTorch model architectures. Models including Llama 3.1-8B and Llama-3.3-70B have migrated from Transformers NeuronX to the NxD backend.

Training #

Library Upgrades

NxD Training (NxDT) Library: Upgraded to support PyTorch 2.7 and Transformers 4.48.
JAX Training Support: Upgraded to JAX 0.6.0.

Neuron Kernel Interface (NKI)#

New nki.language.gather_flattened: Provides efficient parallel tensor element gathering.
Enhanced accuracy: Improved valid range of nki.language.sqrt and nki.isa.activation(nl.sqrt)
Advanced indexing: Improved performance for nki.isa.nc_match_replace8.

Neuron Tools #

Neuron Profiler Enhancements

Framework stack traces: Maps device instructions to model source code.
Scratchpad memory usage visualization: Shows tensor-level memory usage over time with HLO name association.
On-device collectives barriers: Identifies synchronization overhead.
HBM throughput visualization: Tracks data movement involving High Bandwidth Memory (HBM) over time.

NCCOM-TEST Improvements

Added --report-to-json-file flag: Outputs results in JSON format.
Added --show-input-output-size flag: Explicitly displays input and output sizes based on operations.

Neuron Deep Learning Containers (DLCs)#

Updated containers with PyTorch 2.7 support for inference and training.
Added new inference container with NxD Inference and vLLM with FastAPI.
JAX DLCs now support JAX 0.6.0 training.

Neuron Deep Learning AMIs (DLAMIs)#

Updated MultiFramework DLAMIs to include PyTorch 2.7 and JAX 0.6.0.
Added new Single Framework DLAMIs for PyTorch 2.7 and JAX 0.6.0.

Neuron 2.24 Feature Release Notes#

What’s New	Details	Instances
NxD Core (neuronx-distributed)	NxD Core Release Notes (neuronx-distributed)	`Trn1` / `Trn1n`, `Trn2`
NxD Inference (neuronx-distributed-inference)	NxD Inference Release Notes (neuronx-distributed-inference)	`Inf2`, `Trn1` / `Trn1n`, `Trn2`
NxD Training (neuronx-distributed-training)	NxD Training Release Notes (neuronx-distributed-training)	`Trn1` / `Trn1n`, `Trn2`
PyTorch NeuronX (torch-neuronx)	PyTorch Neuron (torch-neuronx) release notes	`Inf2`, `Trn1` / `Trn1n`, `Trn2`
Neuron Compiler (neuronx-cc)	Neuron Compiler (neuronx-cc) release notes	`Inf2`, `Trn1` / `Trn1n`, `Trn2`
Neuron Kernel Interface (NKI)	Neuron Kernel Interface (NKI) release notes	`Inf2`, `Trn1`/ `Trn1n`
Neuron Tools	Neuron System Tools	`Inf1`, `Inf2`, `Trn1`/ `Trn1n`
Neuron Runtime	Neuron Runtime Release Notes	`Inf1`, `Inf2`, `Trn1`/ `Trn1n`
Transformers NeuronX (transformers-neuronx) for Inference	Transformers Neuron (transformers-neuronx) release notes	`Inf2`, `Trn1` / `Trn1n`
Neuron Deep Learning AMIs (DLAMIs)	Neuron DLAMI User Guide	`Inf1`, `Inf2`, `Trn1` / `Trn1n`
Neuron Deep Learning Containers (DLCs)	Neuron DLC Release Notes	`Inf1`, `Inf2`, `Trn1` / `Trn1n`
Release Announcements	Announcing end of support for Beta PyTorch NeuronCore Placement APIs starting next release Announcing end of support for NKI block dimension starting next release announce-eos-pytorch25 Announcing End of Support for Tensorflow Neuron Inf1 SSD300 tutorial starting next release Announcing end of support for Transformers NeuronX library starting in Neuron 2.26 release Announcing end of support XLA_USE_BF16 and XLA_DOWNCAST_BF16 environment variables starting next release Announcing end of support for NKI block dimension starting next release Announcing end of support for Llama 3.2 Meta checkpoint Neuron no longer supports nki_jit API in PyTorch Neuron starting this release See more at Announcements.	`Inf1`, `Inf2`, `Trn1`/ `Trn1n`

For detailed release artifacts, see Release Artifacts.

Previous Releases#

This document is relevant for: Inf1, Inf2, Trn1, Trn2

What’s New

Contents