This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
NeuronX Runtime#
The NeuronX Runtime is a high-performance execution engine that enables deep learning models to run on AWS Inferentia and Trainium accelerators. It consists of a kernel driver and C/C++ libraries that provide low-level APIs for accessing Neuron devices, managing model execution, and coordinating collective communications across NeuronCores.
The Neuron Runtime serves as the foundation for all ML framework integrations (TensorFlow, PyTorch, JAX, and Apache MXNet), loading compiled models in Neuron Executable File Format (NEFF) and orchestrating their execution on Neuron hardware. It is optimized for high-throughput and low-latency inference and training workloads, with features including:
Efficient model execution: Loads and executes NEFF files on NeuronCores with optimized memory management
Multi-model support: Manages multiple models across multiple NeuronCores with flexible allocation strategies
Collective communications: Provides high-performance collective operations for distributed training and inference
Device management: Handles NeuronCore allocation, device discovery, and resource management
Debugging support: Offers core dump generation, debug streams, and detailed logging for troubleshooting
Configuration flexibility: Extensive environment variables for fine-tuning runtime behavior
The Neuron Runtime is typically used transparently through ML framework plugins, but also provides direct C/C++ APIs for developers building custom frameworks or requiring low-level device control.
Get Started#
Reference#
Learn More#
Collectives#
Release Notes#
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3