NxD Inference Tutorials#
Welcome to the NeuronX Distributed (NxD) Inference tutorials collection. These step-by-step guides help you deploy and optimize large language models (LLMs) on AWS Neuron hardware. Learn how to run various models like Llama3, GPT, and more with different optimization techniques including speculative decoding, tensor parallelism, and disaggregated inference.
Pixtral Tutorial
Learn how to deploy mistralai/Pixtral-Large-Instruct-2411 on a single trn2.48xlarge instance.
Qwen3 MoE Inference
Learn how to deploy Qwen/Qwen3-235B-A22B with NxD Inference with various performance tuning options.
Qwen3 VL 8B Tutorial
Learn how to deploy Qwen/Qwen3-VL-8B-Thinking on a single trn2.48xlarge instance.
Qwen2 VL Inference
Learn how to deploy Qwen/Qwen2-VL-7B-Instruct with NxD Inference with various performance tuning options.