Skip to main content
Ctrl+K
Neuron 2.29.0 is released! Check the What's New and Release Notes for more details.
Logo image
Ctrl+K
Search Engine: Default Google
  • About Neuron
    • App Notes
      • Neuron Runtime Library
      • Performance
      • Parallel execution
      • PyTorch for Neuron
        • Running inference on variable input shapes with bucketing
        • Running R-CNNs on Inf1
        • Data Parallel Inference on Torch Neuron
      • PyTorch for NeuronX
        • Introducing PyTorch 2.6 Support
        • Introducing PyTorch 2.7 Support
        • Introducing PyTorch 2.8 Support
        • Introducing PyTorch 2.9 Support
        • Introducing PyTorch 2.5 Support
        • Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16
        • Data Parallel Inference on torch_neuronx
        • Graph Partitioner on torch_neuronx
    • Ask Amazon Q
    • Benchmarks
      • Inf1 Inference Performance
      • Inf2 Inference Performance
      • Trn1/Trn1n Inference Performance
      • Trn1/Trn1n Training Performance
    • Beta Participation
    • Model Samples
      • Training on Trn1
      • Inference on Inf2/Trn1/Trn2
      • Inference on Inf1
    • Neuron FAQ
    • Neuron Features
      • Custom C++ operators
      • Data types
      • Logical NeuronCore configuration
      • Neuron persistent cache
      • NeuronCore batching
      • NeuronCore pipeline
      • Rounding modes
    • Open Source
    • SDK Maintenance Policy
    • Security
    • Term Glossary
    • Troubleshooting
    • What is AWS Neuron?
    • Older Neuron FAQS
      • Neuron 2.x Introduction FAQ
      • ONNX FAQ
      • Contributing Guidelines FAQ
      • Inference with Neuron FAQ
      • Troubleshooting for Inf1 FAQ
      • Training with Neuron FAQ
  • Neuron Architecture
    • AWS Inferentia
    • AWS Inferentia2
    • AWS Trainium
    • AWS Trainium2
    • AWS Trainium3
    • NeuronCore v1
    • NeuronCore v2
    • NeuronCore v3
    • NeuronCore v4
    • Inf1 Architecture
    • Inf2 Architecture
    • Trn1 Architecture
    • Trn2 Architecture
    • Trn3 Architecture
  • What's New
    • Release Notes
      • Neuron 2.29.0
        • PyTorch (torch-neuronx)
        • NxD Inference/vLLM
        • NKI
        • NKI Library
        • Neuron Runtime
        • Developer tools
        • Deep Learning AMIs
        • Deep Learning Containers
      • Component release notes
        • Neuron Compiler
        • Neuron Containers
        • Neuron Developer Tools
        • Neuron DLAMI
        • JAX NeuronX
        • NKI Library
        • Neuron Kernel Interface
        • NxD Core
        • NxD Inference
        • NxD Training
        • PyTorch Neuron Framework
        • Neuron Runtime
        • Older components and features
      • Release artifacts
      • Previous versions
        • Neuron 2.28.1
        • Neuron 2.28.0
        • Neuron 2.27.1
        • Neuron 2.27.0
        • Neuron 2.26.1
        • Neuron 2.26.0
        • Neuron 2.25.0
  • Announcements
  • News & Blogs
  • Contribute

Get Started

  • Quickstarts
    • Quickstart: Train a Model on Trainium
    • Quickstart: Run Inference on Inferentia
    • Quickstart: Serve models online with vLLM on Neuron
    • Quickstart: Run offline inference with vLLM on Neuron
    • Ask Q Developer
    • Neuron Quick Links
    • Neuron GitHub Samples
    • Get Started with PyTorch Neuron
    • Get Started with TensorFlow Neuron
    • Get Started with Apache MXNet Neuron
  • Setup Guides
    • PyTorch
      • New DLAMI
      • Update Existing DLAMI
      • New DLC
      • Update Existing DLC
      • New Manual Configuration
      • Update Manual Configuration
    • JAX
      • Install JAX via Deep Learning AMI
      • Install JAX via Deep Learning Container
      • Install JAX Manually
    • Multi-framework
    • torch-neuron (Legacy)
      • PyTorch on Inf1 (legacy)
    • Use Rocky Linux 9
    • Troubleshooting
  • Developer Flows
    • Amazon EKS
      • Using Neuron with Amazon EKS
      • Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference
      • Deploy a simple mlp training script as a Kubernetes job
    • Amazon ECS
      • Neuron Problem Detector And Recovery
      • Deploy Neuron Container on Elastic Container Service (ECS) for Inference
      • Deploy Neuron Container on Elastic Container Service (ECS) for Training
    • AWS ParallelCluster
      • Parallel Cluster Flows- Training
        • Train your model on ParallelCluster
    • AWS Batch
      • Train your model on AWS Batch
    • Amazon SageMaker
    • Third-party Solutions

ML Frameworks

  • Home
  • PyTorch
    • About PyTorch on Neuron
    • Native PyTorch
    • PyTorch Setup
    • Training
      • Tutorials
        • Hugging Face BERT Pretraining Tutorial (Data-Parallel)
        • Multi-Layer Perceptron Training Tutorial
        • PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API
        • ZeRO-1 Tutorial
        • Analyze for Training Tutorial
        • Neuron Custom C++ Operators in MLP Training
        • Neuron Custom C++ Operators Performance Optimization
      • Additional Examples
        • AWS Neuron Reference for Nemo Megatron GitHub Repository
        • AWS Neuron Samples for EKS
        • AWS Neuron Samples for AWS ParallelCluster
        • AWS Neuron Samples GitHub Repository
      • API Reference Guide
        • PyTorch NeuronX neuron_parallel_compile CLI
        • PyTorch NeuronX Environment Variables
        • Neuron Persistent Cache
        • PyTorch NeuronX Profiling API
      • Developer Guide
        • Developer Guide for Training with PyTorch NeuronX
        • How to debug models in PyTorch NeuronX
        • Developer Guide for Profiling with PyTorch NeuronX
      • Misc
        • PyTorch Neuron (torch-neuronx) - Supported Operators
        • How to prepare trn1.32xlarge for multi-node execution
        • PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
        • Component Release Notes for Neuron PyTorch Framework
    • Inference
      • Tutorials
        • Compiling and Deploying HuggingFace Pretrained BERT on Trn1 or Inf2
        • BERT TorchServe Tutorial
        • LibTorch C++ Tutorial
        • Compiling and Deploying ResNet50 on Trn1 or Inf2
        • T5 model inference on Trn1 or Inf2
      • Additional Examples
        • AWS Neuron Samples GitHub Repository
        • Transformers Neuron GitHub samples
      • API Reference Guide
        • PyTorch NeuronX Tracing API for Inference
        • PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference
        • PyTorch NeuronX NeuronCore Placement APIs
        • PyTorch NeuronX Analyze API for Inference
        • PyTorch NeuronX DataParallel API
      • Developer Guide
        • NeuronCore Allocation and Model Placement for Inference (torch-neuronx)
        • Comparison of Traced Inference versus XLA Lazy Tensor Inference (torch-neuronx)
        • Data Parallel Inference on torch_neuronx
      • Misc
        • Component Release Notes for Neuron PyTorch Framework
    • torch-neuron v. torch-neuronx
    • Release Notes
  • JAX
    • JAX NeuronX plugin Setup
    • JAX NeuronX Known Issues
    • API Reference Guide for JAX Neuronx
      • JAX NeuronX Environment Variables
    • Release Notes

Training

  • NxD Training
    • Overview
    • Setup
    • Tutorials
      • HuggingFace Llama3.1/Llama3-8B Pretraining
      • HuggingFace Llama3.1/LLama3-8B Supervised Fine-tuning
      • HuggingFace Llama3.1/Llama3-8B Efficient Supervised Fine-tuning with LoRA (Beta)
      • HuggingFace Llama3.1/Llama3-8B Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO) based Fine-tuning (Beta)
      • HuggingFace Llama3.1/Llama3-70B Pretraining
      • Checkpoint Conversion
    • Developer Guides
      • Integrating a new model
      • Integrating a new dataset/dataloader
      • Registering an optimizer and LR scheduler
      • Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training
      • NxD Training Compatibility with NeMo
      • CPU Mode Developer Guide
    • API Reference Guide
      • YAML Configuration Settings
    • App Notes
      • Introducing NxD Training
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
    • Release Notes
    • Misc
      • Component Release Notes for NxD Training
      • Known Issues and Workarounds
  • NxD Core (Training)
    • Setup
    • App Notes
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
      • Context Parallelism Overview
    • API Reference Guide
      • Distributed Strategies APIs
      • Training APIs
      • Inference APIs
      • ModelBuilderV2 API Reference
    • Developer Guide
      • Developer guide for Tensor Parallelism
      • Developer guide for Pipeline Parallelism
      • Developer guide for Activation Memory reduction
      • Developer guide for save/load checkpoint
      • Developer guide for Neuron-PT-Lightning
      • Developer guide for model and optimizer wrapper
      • Developer guide for LoRA finetuning
    • Tutorials
      • Training Tutorials
        • Training using Tensor Parallelism
        • Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1
        • Training Llama 3.1 70B/Llama 3 70B using TP and PP
        • Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
      • Inference Tutorials
        • [Broken] T5 inference with Tensor Parallelism
    • Misc
      • Component Release Notes for NxD Core

Inference

  • Overview
  • vLLM
    • Quickstart: Offline Model Serving
    • Quickstart: Online Model Serving
    • vLLM on Neuron User Guide
    • Model Recipes
      • Llama 3.3 70B
      • Qwen3 235B A22B
    • Deploy Llama4 with vLLM
  • NxD Inference
    • Overview
    • Setup
    • Tutorials
      • Disaggregated Inference (1P1D)
      • Disaggregated Inference
      • Flux Inference
      • Flux Inpainting
      • Benchmark using Performance CLI
      • GPT-OSS 120B
      • Llama3.1 405B on Trn2
      • Llama3.1 405B with Speculative Decoding
      • Llama3.1 70B Instruct Accuracy Evaluation
      • Llama3.1 8B with Multi-LoRA
      • Llama3.3 70B FP8
      • Llama3.3 70B with APC
      • Llama3.3 70B with Data Parallelism
      • Llama3.3 70B with Speculative Decoding
      • Llama4
      • Llama4 Legacy
      • Pixtral
      • Qwen3 MoE 235B
      • Qwen2 VL 7B
      • Speculative Decoding
      • Qwen3-VL 8B
    • Developer Guides
      • Accuracy Evaluation
      • Custom Quantization
      • Disaggregated Inference
      • Feature Guide
      • Using Pipeline Execution Mode
      • LLM Benchmarking
      • Migrate from TNX
      • Model Reference
      • MoE Architecture
      • Migrate from NxD Core
      • Onboarding Models
      • Performance Benchmarking CLI
      • vLLM Guide (Legacy)
      • vLLM Guide v1
      • Weights Sharding
      • Writing Tests
    • API Reference Guide
      • NxD Inference API Reference
    • App Notes
      • Introducing NeuronX Distributed (NxD) Inference
      • Parallelism Techniques for LLM Inference
    • Release Notes
    • Misc
      • Component Release Notes for NxD Inference
      • Troubleshooting Guide for NxD Inference
  • NxD Core (Inference)
    • Setup
    • App Notes
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
      • Context Parallelism Overview
    • API Reference Guide
      • Distributed Strategies APIs
      • Training APIs
      • Inference APIs
      • ModelBuilderV2 API Reference
    • Developer Guide
      • About NeuronX-Distributed (NxD) Inference
    • LoRA Guide
    • Tutorials
      • Training Tutorials
        • Training using Tensor Parallelism
        • Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1
        • Training Llama 3.1 70B/Llama 3 70B using TP and PP
        • Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
      • Inference Tutorials
        • [Broken] T5 inference with Tensor Parallelism
    • Misc
      • Component Release Notes for NxD Core

Developer Tools

  • Home
    • Neuron Profiler 2.0
    • Neuron Profiler
    • System Tools
      • Neuron-Monitor User Guide
      • Neuron-Top User Guide
      • Neuron-LS User Guide
      • Neuron-Sysfs User Guide
      • NCCOM-TEST User Guide
      • TensorBoard
        • TensorBoard for NeuronX
    • Third-party Tools
    • Tutorials
      • Profiling a vLLM Inference Workload on AWS Trainium
      • Profiling PyTorch NeuronX with TensorBoard
      • Track Training Progress in TensorBoard using PyTorch Neuron
      • Track System Resource Utilization during Training with neuron-monitor using PyTorch Neuron
    • Release Notes
  • Neuron Explorer
    • Get Started
    • Neuron Profiler to Neuron Explorer Migration Guide
    • Capture and View Profiles
    • Device Trace Viewer
    • System Trace Viewer
    • Hierarchy Viewer
    • Source Code Viewer
    • Summary Viewer
    • Database Viewer
    • Tensor Viewer
    • Memory Viewer
    • AI Recommendation Viewer
    • View Profiles with Perfetto

Orchestrate and Deploy

  • AWS Workload Orchestration
    • Amazon EKS
      • Using Neuron with Amazon EKS
      • Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference
      • Deploy a simple mlp training script as a Kubernetes job
    • Amazon ECS
      • Neuron Problem Detector And Recovery
      • Deploy Neuron Container on Elastic Container Service (ECS) for Inference
      • Deploy Neuron Container on Elastic Container Service (ECS) for Training
    • AWS ParallelCluster
      • Parallel Cluster Flows- Training
        • Train your model on ParallelCluster
    • AWS Batch
      • Train your model on AWS Batch
    • Amazon SageMaker
    • Third-party Solutions
  • Neuron DLAMI
  • Neuron Containers
    • Getting Started
    • Locate Neuron DLC Images
    • Customize DLC
    • Neuron Plugins
    • Tutorials
      • Inference
        • Run Inference in PyTorch Neuron Container
        • Deploy a TensorFlow Resnet50 model as a Kubernetes service
      • Training
        • Run Training in PyTorch Neuron Container
        • Deploy a simple mlp training script as a Kubernetes job
      • Tutorial Docker environment setup
      • Tutorial How to Build and Run a Neuron Container
      • Tutorial Docker Neuron OCI Hook Setup
      • Kubernetes environment setup for Neuron
      • Components Included
      • Neuron Scheduler Extension Flow Diagram
      • Deploy Neuron Monitor DaemonSet
      • Deploy Neuron Node Problem Detector and Recovery
      • Permissions for Neuron Node Problem Detector and Recovery
    • How-To Guides
    • FAQ
    • DRA
      • Support Files
    • Release Notes

Runtime & Collectives

  • Neuron Runtime
    • Overview
    • Get Started
    • Deep Dives
      • Understand NEFF Files
      • Compute-Communication Overlap
      • Neuron Device Memory
      • Direct HBM Tensor Allocation
      • Runtime Performance Tips
      • Neuron Runtime Core Dumps
      • Inter-node Collectives
      • Intra-node Collectives
    • Configuration Guide
      • Runtime Configuration
    • Developer Guide
    • API Reference
      • Debug Stream APIs
      • NRT Async
      • NRT Async Send/Recv
      • NRT Profile
      • NRT System Trace
      • Debug Stream
      • NEC API
      • NDL API
      • Neuron Driver Shared
      • Tensor Batch Operations
      • Neuron Datastore
      • NRT Experimental
    • NRT Debug Stream
    • Troubleshooting on Inf1 and Trn1
    • Release Notes
    • FAQ
  • Collectives
  • Neuron C++ Custom Operators
    • API Reference Guide
      • Custom Operators API Reference Guide [Beta]
    • Developer Guide
      • Neuron Custom C++ Operators Developer Guide [Beta]
    • Tutorials
      • Neuron Custom C++ Operators in MLP Training
      • Neuron Custom C++ Operators Performance Optimization
    • Misc (Neuron Custom C++ Operators)
      • Neuron Custom C++ Tools Release Notes
      • Neuron Custom C++ Library Release Notes

Compilers

  • Graph Compiler
    • NeuronX Compiler for Trn1 & Inf2
      • API Reference Guide
      • How-to: Convolution
        • * `UNet training sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/unet_image_segmentation>`_ - Sample UNet training implementation
      • Developer Guide
        • Mixed Precision and Performance-accuracy Tuning (neuronx-cc)
        • How to Use Convolution Kernels in UNet Training Models
      • FAQ
    • Neuron Compiler for Inf1
      • API Reference Guide
        • Neuron compiler CLI Reference Guide (neuron-cc)
      • CLI Reference
      • Developer Guide
        • Mixed precision and performance-accuracy tuning (neuron-cc)
      • FAQ
    • Error Codes
      • NCC_EARG001
      • NCC_EBIR023
      • NCC_EBVF030
      • NCC_EHCA005
      • NCC_EOOM001
      • NCC_EOOM002
      • NCC_ESFH002
      • NCC_ESPP004
      • NCC_ESPP047
      • NCC_EUOC002
      • NCC_EVRF001
      • NCC_EVRF004
      • NCC_EVRF005
      • NCC_EVRF006
      • NCC_EVRF007
      • NCC_EVRF009
      • NCC_EVRF010
      • NCC_EVRF011
      • NCC_EVRF013
      • NCC_EVRF015
      • NCC_EVRF016
      • NCC_EVRF017
      • NCC_EVRF018
      • NCC_EVRF019
      • NCC_EVRF022
      • NCC_EVRF024
      • NCC_EVRF031
      • NCC_EXSP001
      • NCC_EXTP004
    • Release Notes
  • Compiler Error Codes
    • NCC_EARG001
    • NCC_EBIR023
    • NCC_EBVF030
    • NCC_EHCA005
    • NCC_EOOM001
    • NCC_EOOM002
    • NCC_ESFH002
    • NCC_ESPP004
    • NCC_ESPP047
    • NCC_EUOC002
    • NCC_EVRF001
    • NCC_EVRF004
    • NCC_EVRF005
    • NCC_EVRF006
    • NCC_EVRF007
    • NCC_EVRF009
    • NCC_EVRF010
    • NCC_EVRF011
    • NCC_EVRF013
    • NCC_EVRF015
    • NCC_EVRF016
    • NCC_EVRF017
    • NCC_EVRF018
    • NCC_EVRF019
    • NCC_EVRF022
    • NCC_EVRF024
    • NCC_EVRF031
    • NCC_EXSP001
    • NCC_EXTP004

Neuron Kernel Interface (NKI)

  • Home
    • NKI FAQ
  • Get Started
    • Environment Setup
    • First NKI Kernel
    • NKI Language Guide
    • Concepts
      • Memory Hierarchy
      • Data Representation
      • Indexing
      • Tiling
      • Direct Memory Access
      • Logical Neuron Cores
  • Guides
    • Tutorials
      • Matrix Multiplication
      • AveragePool2D
      • Transpose2D
      • Fused Mamba
      • Introduction to NKI Kernel Optimization
      • SPMD Tensor Addition
      • Multi-Core SPMD Addition
    • Architecture
      • Trainium/Inferentia2 Guide
      • Trainium2 Guide
      • Trainium3 Guide
    • NKI CPU Simulator
    • Insert NKI Kernels into Models
    • Use NKI Scheduling APIs
  • Deep Dives
    • Profile a NKI Kernel
    • Performance Optimizations
    • MXFP8/4 Matrix Multiplication
    • NKI 0.3.0 Update Guide
    • NKI Access Patterns
    • NKI Dynamic Loops
    • Descriptor Generation Engine (DGE)
    • DMA Bandwidth Guide
    • Block Dimension Migration Guide
    • About the NKI Compiler
  • NKI API Reference
    • nki
      • nki.jit
      • nki.simulate
    • nki.isa
      • nki.isa.nc_matmul
      • nki.isa.nc_matmul_mx
      • nki.isa.nc_transpose
      • nki.isa.activation
      • nki.isa.activation_reduce
      • nki.isa.tensor_reduce
      • nki.isa.tensor_partition_reduce
      • nki.isa.tensor_tensor
      • nki.isa.tensor_tensor_scan
      • nki.isa.scalar_tensor_tensor
      • nki.isa.tensor_scalar
      • nki.isa.tensor_scalar_reduce
      • nki.isa.tensor_scalar_cumulative
      • nki.isa.tensor_copy
      • nki.isa.tensor_copy_predicated
      • nki.isa.reciprocal
      • nki.isa.quantize_mx
      • nki.isa.iota
      • nki.isa.dropout
      • nki.isa.affine_select
      • nki.isa.range_select
      • nki.isa.select_reduce
      • nki.isa.sequence_bounds
      • nki.isa.memset
      • nki.isa.bn_stats
      • nki.isa.bn_aggr
      • nki.isa.local_gather
      • nki.isa.dma_copy
      • nki.isa.dma_transpose
      • nki.isa.dma_compute
      • nki.isa.max8
      • nki.isa.nc_n_gather
      • nki.isa.nc_find_index8
      • nki.isa.nc_match_replace8
      • nki.isa.nc_stream_shuffle
      • nki.isa.register_alloc
      • nki.isa.register_load
      • nki.isa.register_move
      • nki.isa.register_store
      • nki.isa.core_barrier
      • nki.isa.sendrecv
      • nki.isa.rng
      • nki.isa.rand2
      • nki.isa.rand_set_state
      • nki.isa.rand_get_state
      • nki.isa.set_rng_seed
      • nki.isa.engine
      • nki.isa.reduce_cmd
      • nki.isa.dge_mode
      • nki.isa.nc_version
      • nki.isa.get_nc_version
    • nki.language
      • nki.language.ndarray
      • nki.language.zeros
      • nki.language.ones
      • nki.language.full
      • nki.language.zeros_like
      • nki.language.empty_like
      • nki.language.shared_identity_matrix
      • nki.language.rand
      • nki.language.random_seed
      • nki.language.load
      • nki.language.load_transpose2d
      • nki.language.store
      • nki.language.copy
      • nki.language.matmul
      • nki.language.transpose
      • nki.language.abs
      • nki.language.add
      • nki.language.arctan
      • nki.language.ceil
      • nki.language.cos
      • nki.language.exp
      • nki.language.floor
      • nki.language.log
      • nki.language.maximum
      • nki.language.minimum
      • nki.language.multiply
      • nki.language.negative
      • nki.language.power
      • nki.language.reciprocal
      • nki.language.rsqrt
      • nki.language.sign
      • nki.language.sin
      • nki.language.sqrt
      • nki.language.square
      • nki.language.subtract
      • nki.language.tan
      • nki.language.tanh
      • nki.language.trunc
      • nki.language.relu
      • nki.language.sigmoid
      • nki.language.silu
      • nki.language.silu_dx
      • nki.language.gelu
      • nki.language.gelu_dx
      • nki.language.gelu_apprx_sigmoid
      • nki.language.gelu_apprx_sigmoid_dx
      • nki.language.gelu_apprx_tanh
      • nki.language.mish
      • nki.language.softplus
      • nki.language.softmax
      • nki.language.erf
      • nki.language.erf_dx
      • nki.language.dropout
      • nki.language.rms_norm
      • nki.language.all
      • nki.language.max
      • nki.language.mean
      • nki.language.min
      • nki.language.prod
      • nki.language.sum
      • nki.language.var
      • nki.language.equal
      • nki.language.not_equal
      • nki.language.less
      • nki.language.less_equal
      • nki.language.greater
      • nki.language.greater_equal
      • nki.language.logical_and
      • nki.language.logical_or
      • nki.language.logical_xor
      • nki.language.logical_not
      • nki.language.bitwise_and
      • nki.language.bitwise_or
      • nki.language.bitwise_xor
      • nki.language.invert
      • nki.language.left_shift
      • nki.language.right_shift
      • nki.language.broadcast_to
      • nki.language.ds
      • nki.language.expand_dims
      • nki.language.where
      • nki.language.gather_flattened
      • nki.language.affine_range
      • nki.language.dynamic_range
      • nki.language.sequential_range
      • nki.language.static_range
      • nki.language.psum
      • nki.language.sbuf
      • nki.language.hbm
      • nki.language.private_hbm
      • nki.language.shared_hbm
      • nki.language.is_psum
      • nki.language.is_sbuf
      • nki.language.is_hbm
      • nki.language.is_on_chip
      • nki.language.device_print
      • nki.language.no_reorder
      • nki.language.program_id
      • nki.language.num_programs
      • nki.language.program_ndim
      • nki.language.bool_
      • nki.language.int8
      • nki.language.int16
      • nki.language.int32
      • nki.language.uint8
      • nki.language.uint16
      • nki.language.uint32
      • nki.language.float16
      • nki.language.float32
      • nki.language.bfloat16
      • nki.language.tfloat32
      • nki.language.float8_e4m3
      • nki.language.float8_e5m2
      • nki.language.float8_e4m3fn
      • nki.language.float8_e5m2_x4
      • nki.language.float8_e4m3fn_x4
      • nki.language.float4_e2m1fn_x4
      • nki.language.tile_size
    • nki.collectives
      • nki.collectives.all_reduce
      • nki.collectives.all_gather
      • nki.collectives.reduce_scatter
      • nki.collectives.all_to_all
      • nki.collectives.all_to_all_v
      • nki.collectives.collective_permute
      • nki.collectives.collective_permute_implicit
      • nki.collectives.collective_permute_implicit_reduce
      • nki.collectives.collective_permute_implicit_current_processing_rank_id
      • nki.collectives.rank_id
      • nki.collectives.ReplicaGroup
    • NKI API Common Fields
  • NKI Library
    • Overview
    • Kernel Design Specs
      • RMSNorm-Quant
    • Kernel API Reference
      • Attention Block TKG
      • Attention CTE
      • Attention TKG
      • Blockwise MM Backward
      • Conv1D
      • Cross Entropy
      • Cumsum
      • Depthwise Conv1D
      • Dynamic Elementwise Add
      • FGCC
      • Find Nonzero Indices
      • Fine-Grained All-Gather
      • MLP
      • MoE CTE
      • MoE TKG
      • Output Projection CTE
      • Output Projection TKG
      • QKV
      • RMSNorm-Quant
      • RoPE
      • Router Top-K
      • SBUF-to-SBUF All-Gather
      • Top-K Reduce
      • Transformer TKG
    • Kernel Utilities
      • SbufManager (Allocator)
      • TensorView
    • Release Notes

Archive

  • Archived content
    • Fine-tune T5 model on Trn1
    • Running SSD300 with AWS Neuron
    • Megatron GPT Pretraining
    • Training GPT-NeoX 20B with Tensor Parallelism and ZeRO-1 Optimizer
    • Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning
    • Training Llama-2-7B/13B/70B using Tensor Parallelism and Pipeline Parallelism with Neuron PyTorch-Lightning
    • Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer
    • Profiling Multi-Node Training Jobs with Neuron Explorer
    • Training GPT-NeoX 6.9B with Tensor Parallelism and ZeRO-1 Optimizer
    • Neuron Plugin for TensorBoard (Inf1)
    • NeuronPerf (Beta)
      • Overview
      • Terminology
      • Examples
      • Benchmark Guide
      • Evaluate Guide
      • Compile Guide
      • Model Index Guide
      • API
      • Framework Notes
      • FAQ
      • Troubleshooting
      • What’s New
        • Component Release Notes for Neuron Developer Tools
    • Helper Tools
      • Check Model
      • GatherInfo
    • Transformers NeuronX (transformers-neuronx)
      • Setup
      • Developer Guide
        • Transformers NeuronX (transformers-neuronx) Developer Guide
        • Transformers NeuronX (transformers-neuronx) Developer Guide for Continuous Batching
      • Tutorials
        • Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-66b autoregressive sampling on Inf2
      • Misc
    • Neuron Apache MXNet Release Notes
      • Apache MXNet Neuron Release Notes
    • TensorFlow Neuron
      • Tensorflow Neuron Setup
      • Inference (Inf2 & Trn1)
        • Tutorials
        • API Reference Guide
        • Misc
      • Inference (Inf1)
        • Tutorials
        • Additional Examples
        • API Reference Guide
        • Misc
    • PyTorch Neuron (torch-neuron) — Archived
      • API Reference Guide (torch-neuron)
        • PyTorch Neuron trace Python API
        • torch.neuron.DataParallel API
        • PyTorch Neuron (torch-neuron) Core Placement API
      • PyTorch-Neuron trace python API
      • PyTorch Neuron (torch-neuron) Core Placement API
      • torch.neuron.DataParallel API
      • Developer Guide (torch-neuron)
        • Running Inference on Variable Input Shapes with Bucketing
        • Data Parallel Inference on PyTorch Neuron
        • Developer Guide - PyTorch Neuron (torch-neuron) LSTM Support
        • PyTorch Neuron (torch-neuron) Core Placement
      • Troubleshooting Guide for PyTorch Neuron (torch-neuron)
      • Tutorials for Inference with torch-neuron (Inf1)
        • Computer Vision Tutorials
        • Natural Language Processing (NLP) Tutorials
        • Utilizing Neuron Capabilities Tutorials
      • Install PyTorch Neuron (torch-neuron)
      • Update to latest PyTorch Neuron (torch-neuron)
      • Additional Examples (torch-neuron)
        • AWS Neuron Samples GitHub Repository
      • Misc (torch-neuron)
        • PyTorch Neuron (torch-neuron) Supported operators
        • Troubleshooting Guide for PyTorch Neuron (torch-neuron)
        • Component Release Notes for Neuron PyTorch Framework
  • Repository
  • Suggest edit
  • Open issue
  • .rst

Tutorials for Inference (torch-neuronx)

This document is relevant for: Inf2, Trn1, Trn2

Tutorials for Inference (torch-neuronx)#

  • HuggingFace pretrained BERT tutorial [html] [notebook]

  • TorchServe tutorial [html]

  • LibTorch C++ tutorial (for torch-neuron and torch-neuronx) [html]

  • Torchvision ResNet50 tutorial [html] [notebook]

  • T5 inference tutorial [html] [notebook]

Note

To use Jupyter Notebook see:

  • Jupyter Notebook QuickStart

  • Running Jupyter Notebook as script

This document is relevant for: Inf2, Trn1, Trn2

previous

Inference with torch-neuronx (Inf2 & Trn1/Trn2)

next

Compiling and Deploying HuggingFace Pretrained BERT on Trn1 or Inf2

By AWS

© Copyright 2026, Amazon.com.