This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Neuron Runtime Deep Dives#

Curious about how the Neuron Runtime works? Looking for deeper explorations of the computer science, techniques, and algorithms used to develop it? This section provides topics that dive into the learnings and engineering behind the Neuron Runtime, written by the AWS engineers who developed it.

NeuronX Runtime Deep Dives#

Understand NEFF Files

Work with NEFF Files

Explore the structure and contents of NEFF files, the compiled model format used by the Neuron Runtime.

Compute-Communication Overlap

Compute-Communication Overlap in Neuron

Neuron Device Memory

Neuron Device Memory

Learn how the Neuron Runtime overlaps computation and communication to maximize performance on AWS Inferentia and Trainium chips.

Neuron Device Memory

Neuron Device Memory

Understand, monitor, and optimize memory usage on AWS Neuron devices including tensors, model constants, scratchpad allocations, and more.

Direct HBM Tensor Allocation

Direct HBM Tensor Allocation with Neuron

Optimize performance by allocating tensors directly into High Bandwidth Memory (HBM) on Neuron devices, eliminating CPU-device memory transfer overhead.

Runtime Performance Tips

Best Practices: Neuron Runtime Performance

Best practices and optimization techniques for achieving optimal performance with the AWS Neuron Runtime.

Neuron Runtime Core Dumps

Deep Dive: Explore Neuron runtime core dumps

Dive into the structure and analysis of Neuron Runtime core dumps to troubleshoot and debug runtime issues effectively.

Neuron Collectives Deep Dives#

Inter-node Collectives Communication

Inter-node Collective Communications with AWS Neuron

Explore Ring, Mesh, and Recursive Doubling-Halving algorithms for coordinating data exchange across multiple nodes via EFA networks.

Intra-node Collectives Communication

Intra-node Collective Communications with AWS Neuron

Learn about Ring, Mesh, KangaRing, and RDH algorithms optimized for high-bandwidth NeuronLink communication within single nodes.

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Neuron Runtime Deep Dives

Contents

Neuron Runtime Deep Dives#

NeuronX Runtime Deep Dives#

Neuron Collectives Deep Dives#