This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

NeuronX Runtime#

The NeuronX Runtime is a high-performance execution engine that enables deep learning models to run on AWS Inferentia and Trainium accelerators. It consists of a kernel driver and C/C++ libraries that provide low-level APIs for accessing Neuron devices, managing model execution, and coordinating collective communications across NeuronCores.

The Neuron Runtime serves as the foundation for all ML framework integrations (TensorFlow, PyTorch, JAX, and Apache MXNet), loading compiled models in Neuron Executable File Format (NEFF) and orchestrating their execution on Neuron hardware. It is optimized for high-throughput and low-latency inference and training workloads, with features including:

  • Efficient model execution: Loads and executes NEFF files on NeuronCores with optimized memory management

  • Multi-model support: Manages multiple models across multiple NeuronCores with flexible allocation strategies

  • Collective communications: Provides high-performance collective operations for distributed training and inference

  • Device management: Handles NeuronCore allocation, device discovery, and resource management

  • Debugging support: Offers core dump generation, debug streams, and detailed logging for troubleshooting

  • Configuration flexibility: Extensive environment variables for fine-tuning runtime behavior

The Neuron Runtime is typically used transparently through ML framework plugins, but also provides direct C/C++ APIs for developers building custom frameworks or requiring low-level device control.

Get Started#

About the NeuronX Runtime

Learn about the AWS Neuron Runtime, its features, and capabilities for accessing Inferentia and Trainium Neuron devices.

Quickstart: Generate a Core Dump

Learn how to generate a Neuron runtime core dump for debugging runtime failures and analyzing device state.

Reference#

Runtime Developer Guide

Comprehensive guide to the Neuron Runtime API for developers building custom frameworks that call libnrt APIs directly.

Runtime API Reference Documentation

Documentation of the APIs in the public headers for the Neuron Runtime.

Runtime Configuration

Learn how to configure the Neuron Runtime using environment variables to control NeuronCore allocation, logging, and more.

Troubleshooting on Inf1 and Trn1

Solutions for common issues encountered when using the Neuron Runtime on Inferentia and Trainium instances.

Frequently Asked Questions

Answers to common questions about the Neuron Runtime, including compatibility, configuration, and usage.

Learn More#

Explore the Neuron Runtime

Deep dives into the Neuron Runtime, including NEFF files, compute-communication overlap, device memory, and core dumps.

Collectives#

About Collectives

Learn about Neuron Runtime collectives.

Deep Dive: Inter-node Collective Communication

Explore and understand techniques for communication across nodes in the Neuron Runtime.

Deep dive: Intra-node Collective Communication

Explore and understand techniques for communication within nodes in the Neuron Runtime.

Release Notes#

Runtime Release Notes

Latest updates, improvements, and bug fixes for the Neuron Runtime library, driver, and collectives.

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3