This document is relevant for: Inf2, Trn1, Trn2, Trn3

Developer Guide#

Learn how to optimize your models with the Neuron Compiler (neuronx-cc). These guides cover mixed precision training, performance-accuracy tuning, and custom kernel implementations for AWS Trainium and Inferentia instances.

Mixed Precision and Performance-Accuracy Tuning

Learn how to use FP32, TF32, FP16, and BF16 data types with the Neuron Compiler’s auto-cast options to balance performance and accuracy. Understand the tradeoffs between different data types and how to configure compiler settings for optimal model execution.

How to Use Convolution Kernels in UNet Training Models

Modify UNet training models to use custom convolution kernels with NKI (Neuron Kernel Interface). This implementation helps avoid out-of-memory errors when training convolution-heavy models on Trainium instances.

This document is relevant for: Inf2, Trn1, Trn2, Trn3