This document is relevant for: Inf2, Trn1, Trn2

Inference with torch-neuronx (Inf2 & Trn1/Trn2)#

Deploy inference workloads using PyTorch NeuronX on Inf2, Trn1, and Trn2 instances.

Get Started#

Setup (torch-neuronx)

Install and configure PyTorch NeuronX for inference workloads on Inf2, Trn1, and Trn2 instances.

Tutorials#

Inference Tutorials

Step-by-step tutorials including BERT, TorchServe, LibTorch C++, ResNet50, and T5 inference.

Reference#

API Reference Guide

Inference API reference for PyTorch NeuronX, including trace, replace weights, core placement, and data parallel APIs.

Developer Guide

In-depth developer guide covering core placement, trace vs XLA, data parallelism, and auto-bucketing.

Additional Resources#

Additional Examples

More inference examples and sample code from the AWS Neuron Samples repository.

Misc

Supported operators, release notes, and additional inference resources.

This document is relevant for: Inf2, Trn1, Trn2