This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

AWS Neuron SDK 2.27.0: NxD Inference release notes#

Date of release: December 19, 2025

What’s New#

Trn3 Platform Support — Added support for running NxD Inference on Trn3 instances.

vLLM V1 support - This release adds support for vLLM V1 through vllm-neuron plugin. You can use the vLLM V1 by using the new vLLM V1 based Neuron DLC or using the vLLM virtual environment in Neuron DLAMIs. See vLLM V1 guide for more information.

Qwen3 MoE Model Support (Beta) — NxD Inference supports Qwen3 MoE language model which supports multilingual text inputs. You can use HuggingFace checkpoint. For more information about how to run Qwen3 MoE inference, see Tutorial: Qwen3 MoE Inference.

Compatible models include:

Pixtral Model Support (Beta) — NxD Inference supports Pixtral image understanding model which processes text and image inputs. You can use HuggingFace checkpoint. For more information about how to run Pixtral inference, see Tutorial: Deploy Pixtral Large on Trn2 instances.

Compatible models include:

Known Issues#

  • Pixtral deployment is supported up to batch size 32 and sequence length 10240 with vLLM v0. vLLM v1 deployment supports up to batch size 4 and sequence length 10240.

  • The performance of Qwen3 MoE and Pixtral on Trn2 is not fully optimized. We will address the issues in the future release.

  • The vllm-neuron plugin source code in github is currently not compatible with 2.27 SDK. Customers are advised to use inference DLAMI and DLC published with 2.27.0 SDK for vLLN V1 support. vllm-neuron github repo source code will be updated soon to be compatible with 2.27 release SDK.

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3