NxD Inference Tutorials#
Welcome to the NeuronX Distributed (NxD) Inference tutorials collection. These step-by-step guides help you deploy and optimize large language models (LLMs) on AWS Neuron hardware. Learn how to run various models like Llama3, GPT, and more with different optimization techniques including speculative decoding, tensor parallelism, and disaggregated inference.
Pixtral Tutorial
Learn how to deploy mistralai/Pixtral-Large-Instruct-2411 on a single trn2.48xlarge instance.
Qwen3 MoE Inference
Learn how to deploy Qwen/Qwen3-235B-A22B with NxD Inference with various performance tuning options.