This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Neuron no longer supports vLLM V0 starting with Neuron 2.28#

Starting with Neuron 2.28 release, vLLM V0 will no longer be supported. This includes the vLLM V0 Neuron forks in the AWS Neuron upstreaming-to-vllm GitHub repo and vLLM V0-based Neuron Inference Deep Learning Containers.

Customers are recommended to use vLLM V1-based inference containers as documented in the vLLM V1 user guide. Additionally, Neuron will be updating existing vLLM-based tutorials to use vLLM V1 in the coming release.

See vLLM on Neuron for more information on vLLM V1 support.

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3