This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
Neuron Runtime Deep Dives#
Curious about how the Neuron Runtime works? Looking for deeper explorations of the computer science, techniques, and algorithms used to develop it? This section provides topics that dive into the learnings and engineering behind the Neuron Runtime, written by the AWS engineers who developed it.
NeuronX Runtime Deep Dives#
Explore the structure and contents of NEFF files, the compiled model format used by the Neuron Runtime.
Learn how the Neuron Runtime overlaps computation and communication to maximize performance on AWS Inferentia and Trainium chips.
Understand, monitor, and optimize memory usage on AWS Neuron devices including tensors, model constants, scratchpad allocations, and more.
Optimize performance by allocating tensors directly into High Bandwidth Memory (HBM) on Neuron devices, eliminating CPU-device memory transfer overhead.
Best practices and optimization techniques for achieving optimal performance with the AWS Neuron Runtime.
Dive into the structure and analysis of Neuron Runtime core dumps to troubleshoot and debug runtime issues effectively.
Neuron Collectives Deep Dives#
Explore Ring, Mesh, and Recursive Doubling-Halving algorithms for coordinating data exchange across multiple nodes via EFA networks.
Learn about Ring, Mesh, KangaRing, and RDH algorithms optimized for high-bandwidth NeuronLink communication within single nodes.
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3