NxD Inference Tutorials#
Welcome to the NeuronX Distributed (NxD) Inference tutorials collection. These step-by-step guides help you deploy and optimize large language models (LLMs) on AWS Neuron hardware. Learn how to run various models like Llama3, GPT, and more with different optimization techniques including speculative decoding, tensor parallelism, and disaggregated inference.
Pixtral Tutorial
Learn how to deploy mistralai/Pixtral-Large-Instruct-2411 on a single trn2.48xlarge instance.