This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Neuron Runtime Deep Dives#

Curious about how the Neuron Runtime works? Looking for deeper explorations of the computer science, techniques, and algorithms used to develop it? This section provides topics that dive into the learnings and engineering behind the Neuron Runtime, written by the AWS engineers who developed it.

NeuronX Runtime Deep Dives#

Understand NEFF Files

Explore the structure and contents of NEFF files, the compiled model format used by the Neuron Runtime.

Compute-Communication Overlap
Neuron Device Memory

Learn how the Neuron Runtime overlaps computation and communication to maximize performance on AWS Inferentia and Trainium chips.

Neuron Device Memory

Understand, monitor, and optimize memory usage on AWS Neuron devices including tensors, model constants, scratchpad allocations, and more.

Direct HBM Tensor Allocation

Optimize performance by allocating tensors directly into High Bandwidth Memory (HBM) on Neuron devices, eliminating CPU-device memory transfer overhead.

Runtime Performance Tips

Best practices and optimization techniques for achieving optimal performance with the AWS Neuron Runtime.

Neuron Runtime Core Dumps

Dive into the structure and analysis of Neuron Runtime core dumps to troubleshoot and debug runtime issues effectively.

Neuron Collectives Deep Dives#

Inter-node Collectives Communication

Explore Ring, Mesh, and Recursive Doubling-Halving algorithms for coordinating data exchange across multiple nodes via EFA networks.

Intra-node Collectives Communication

Learn about Ring, Mesh, KangaRing, and RDH algorithms optimized for high-bandwidth NeuronLink communication within single nodes.

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3