This document is relevant for: Inf2, Trn1, Trn2, Trn3
Neuron Runtime How-To Guides#
Task-focused guides for developers working directly with the Neuron Runtime
(libnrt). Use these when you are building a custom framework on top of
the runtime, migrating an existing C/C++ application to the explicit async
APIs, or tuning runtime behavior through environment variables. If you are
using Neuron through PyTorch, JAX, or TensorFlow, the framework handles most
of the runtime interaction for you — see the NeuronX Runtime overview
for where these guides fit in the larger runtime surface area.
Move a C/C++ application off the legacy
NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS implicit async mode
and onto the nrta_* explicit async APIs. Covers scheduling with
sequence numbers, polling and event-based completion tracking,
per-request error handling, and queue backpressure.