This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Neuron Runtime API Reference#

This section provides comprehensive API reference documentation for the Neuron Runtime (NRT) and Neuron Driver Library (NDL). These APIs enable low-level access to AWS Neuron devices and provide interfaces for model loading, execution, memory management, and collective operations.

Source code for these APIs can be found at: aws-neuron/aws-neuron-sdk.

Core Runtime APIs#

NRT API

Main Neuron Runtime API for model loading, execution, and tensor management

NRT Status

Status codes and error handling for runtime operations

NRT Version

Version information and compatibility checking

Asynchronous Execution APIs#

NRT Async

Asynchronous execution API for non-blocking operations

NRT Async Send/Recv

Asynchronous tensor send and receive operations

Profiling and Debugging APIs#

NRT Profile

Profiling API for performance analysis and optimization

NRT System Trace

System trace capture and event fetching

Debug Stream

Debug event streaming from Logical Neuron Cores

Collective Operations API#

NEC API

Neuron Elastic Collectives (NEC) for distributed operations

Neuron Driver Library (NDL) APIs#

NDL API

Low-level Neuron Driver Library for device access and control

Neuron Driver Shared

Shared definitions between runtime and driver

Tensor Batch Operations

Batch operation structures for tensor transfers

Neuron Datastore API#

Neuron Datastore

Neuron Datastore (NDS) for sharing metrics and model information

Experimental APIs#

NRT Experimental

Experimental features and APIs (subject to change)

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3