This document is relevant for: Inf2, Trn1, Trn2, Trn3

Neuron Runtime How-To Guides#

Task-focused guides for developers working directly with the Neuron Runtime (libnrt). Use these when you are building a custom framework on top of the runtime, migrating an existing C/C++ application to the explicit async APIs, or tuning runtime behavior through environment variables. If you are using Neuron through PyTorch, JAX, or TensorFlow, the framework handles most of the runtime interaction for you — see the NeuronX Runtime overview for where these guides fit in the larger runtime surface area.

Runtime developer guide

Build a C/C++ application against libnrt directly. Covers the runtime architecture, driver and library installation, NEFF loading, tensor staging, execution, and the collective communication library used for distributed workloads.

Migrate to the Explicit Async APIs

Move a C/C++ application off the legacy NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS implicit async mode and onto the nrta_* explicit async APIs. Covers scheduling with sequence numbers, polling and event-based completion tracking, per-request error handling, and queue backpressure.

Runtime Configuration Guide

Configure the Neuron Runtime through environment variables. Covers NeuronCore visibility and allocation, execution timeouts, logging verbosity, core dump behavior, and other runtime knobs you set before launching your application.