Skip to main content
Ctrl+K
Neuron 2.26.1 is released! Check What's New and Announcements for more details.
Logo image
Ctrl+K
Search Engine: Default Google
  • About Neuron
    • App Notes
      • Neuron Runtime Library
      • Performance
      • Parallel execution
      • PyTorch for Neuron
        • Running inference on variable input shapes with bucketing
        • Running R-CNNs on Inf1
        • Data Parallel Inference on Torch Neuron
      • PyTorch for NeuronX
        • Introducing PyTorch 2.6 Support
        • Introducing PyTorch 2.7 Support
        • Introducing PyTorch 2.8 Support
        • Introducing PyTorch 2.5 Support
        • Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16
        • Data Parallel Inference on torch_neuronx
        • Graph Partitioner on torch_neuronx
    • Ask Amazon Q
    • Benchmarks
      • Inf1 Inference Performance
      • Inf2 Inference Performance
      • Trn1/Trn1n Inference Performance
      • Trn1/Trn1n Training Performance
    • Neuron FAQ
    • Neuron Features
      • Collective communication
      • Custom C++ operators
      • Data types
      • Logical NeuronCore configuration
      • Neuron persistent cache
      • NeuronCore batching
      • NeuronCore pipeline
      • Rounding modes
    • Open Source
    • SDK Maintenance Policy
    • Security
    • Term Glossary
    • Troubleshooting
    • What is AWS Neuron?
  • Neuron Architecture
    • AWS Inferentia
    • AWS Inferentia2
    • AWS Trainium
    • AWS Trainium2
    • AWS Trainium3
    • NeuronCore v1
    • NeuronCore v2
    • NeuronCore v3
    • NeuronCore v4
    • Inf1 Architecture
    • Inf2 Architecture
    • Trn1 Architecture
    • Trn2 Architecture
  • What's New
  • Announcements
  • Contribute

Get Started

  • Quickstarts
  • Setup Guides
    • Launching Inf/Trn instances on Amazon EC2
      • Inference
        • Compile with Framework API and Deploy on EC2 Inf1
        • Compile with Framework API and Deploy on EC2 Inf2
      • Training
        • Train your model on EC2
    • PyTorch NeuronX (torch-neuronx)
      • PyTorch NeuronX on Multi-Framework DLAMI (Ubuntu 22)
      • PyTorch NeuronX on Ubuntu 22
      • PyTorch NeuronX on Amazon Linux 2023
      • PyTorch NeuronX on Rocky Linux 9
    • PyTorch Neuron (torch-neuron)
      • PyTorch Neuron on Ubuntu 20
      • PyTorch Neuron on DLAMI Base (Ubuntu 20)
      • PyTorch Neuron on DLAMI PyTorch (Ubuntu 20)
      • PyTorch Neuron on Multi-Framework DLAMI (Ubuntu 22)
      • PyTorch Neuron on Ubuntu 22
      • PyTorch Neuron on Amazon Linux 2023
      • PyTorch Neuron on Rocky Linux 9
    • JAX NeuronX
      • JAX NeuronX plugin Setup
      • JAX NeuronX Known Issues
      • API Reference Guide for JAX Neuronx
        • JAX NeuronX Environment Variables
      • JAX NeuronX (jax-neuronx) release notes
    • Tensorflow NeuronX (tensorflow-neuronx)
      • TensorFlow NeuronX on Multi-Framework DLAMI (Ubuntu 22)
      • TensorFlow NeuronX on Ubuntu 22
      • TensorFlow NeuronX on Amazon Linux 2023
    • Tensorflow Neuron (tensorflow-neuron)
      • TensorFlow Neuron on Ubuntu 20
      • TensorFlow Neuron on DLAMI Base (Ubuntu 20)
      • TensorFlow Neuron on Multi-Framework DLAMI (Ubuntu 22)
      • TensorFlow Neuron on Ubuntu 22
      • TensorFlow Neuron on Amazon Linux 2023
    • MxNet Neuron (mxnet-neuron)
      • MXNet Neuron on Ubuntu 20
      • MXNet Neuron on DLAMI Base (Ubuntu 20)
      • MXNet Neuron on Ubuntu 22
      • MXNet Neuron on Amazon Linux 2023
    • Troubleshooting
  • Models
    • Training on Trn1
    • Inference on Inf2/Trn1/Trn2
    • Inference on Inf1
  • Developer Flows
    • Amazon EKS
      • Using Neuron with Amazon EKS
      • Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference
      • Deploy a simple mlp training script as a Kubernetes job
    • Amazon ECS
      • Neuron Problem Detector And Recovery
      • Deploy Neuron Container on Elastic Container Service (ECS) for Inference
      • Deploy Neuron Container on Elastic Container Service (ECS) for Training
    • AWS ParallelCluster
      • Parallel Cluster Flows- Training
        • Train your model on ParallelCluster
    • AWS Batch
      • Train your model on AWS Batch
    • Amazon SageMaker
    • Third-party Solutions

Use ML Frameworks

  • Home
  • Native PyTorch
  • PyTorch NeuronX
    • Pytorch Neuron Setup
    • Native PyTorch for AWS Trainium
    • Inference (Inf2, Trn1, Trn2)
      • Tutorials
        • Compiling and Deploying HuggingFace Pretrained BERT on Trn1 or Inf2
        • BERT TorchServe Tutorial
        • LibTorch C++ Tutorial
        • Compiling and Deploying ResNet50 on Trn1 or Inf2
        • T5 model inference on Trn1 or Inf2
      • Additional Examples
        • AWS Neuron Samples GitHub Repository
        • Transformers Neuron GitHub samples
      • API Reference Guide
        • PyTorch NeuronX Tracing API for Inference
        • PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference
        • PyTorch NeuronX NeuronCore Placement APIs
        • PyTorch NeuronX Analyze API for Inference
        • PyTorch NeuronX DataParallel API
      • Developer Guide
        • NeuronCore Allocation and Model Placement for Inference (torch-neuronx)
        • Comparison of Traced Inference versus XLA Lazy Tensor Inference (torch-neuronx)
        • Data Parallel Inference on torch_neuronx
      • Misc
        • PyTorch Neuron (torch-neuronx) release notes
    • Inference (Inf1)
      • Tutorials
        • Computer Vision Tutorials
        • Natural Language Processing (NLP) Tutorials
        • Utilizing Neuron Capabilities Tutorials
      • Additional Examples
        • AWS Neuron Samples GitHub Repository
      • API Reference Guide
        • PyTorch Neuron trace Python API
        • torch.neuron.DataParallel API
        • PyTorch Neuron (torch-neuron) Core Placement API
      • Developer Guide
        • Running Inference on Variable Input Shapes with Bucketing
        • Data Parallel Inference on PyTorch Neuron
        • Developer Guide - PyTorch Neuron (torch-neuron) LSTM Support
        • PyTorch Neuron (torch-neuron) Core Placement
      • Misc
        • PyTorch Neuron (torch-neuron) Supported operators
        • Troubleshooting Guide for PyTorch Neuron (torch-neuron)
        • PyTorch Neuron (torch-neuron) release notes
    • Training
      • Tutorials
        • Hugging Face BERT Pretraining Tutorial (Data-Parallel)
        • Multi-Layer Perceptron Training Tutorial
        • PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API
        • ZeRO-1 Tutorial
        • Analyze for Training Tutorial
        • Neuron Custom C++ Operators in MLP Training
        • Neuron Custom C++ Operators Performance Optimization
      • Additional Examples
        • AWS Neuron Reference for Nemo Megatron GitHub Repository
        • AWS Neuron Samples for EKS
        • AWS Neuron Samples for AWS ParallelCluster
        • AWS Neuron Samples GitHub Repository
      • API Reference Guide
        • PyTorch NeuronX neuron_parallel_compile CLI
        • PyTorch NeuronX Environment Variables
        • Neuron Persistent Cache
        • PyTorch NeuronX Profiling API
      • Developer Guide
        • Developer Guide for Training with PyTorch NeuronX
        • How to debug models in PyTorch NeuronX
        • Developer Guide for Profiling with PyTorch NeuronX
      • Misc
        • PyTorch Neuron (torch-neuronx) - Supported Operators
        • How to prepare trn1.32xlarge for multi-node execution
        • PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
        • PyTorch Neuron (torch-neuronx) release notes
  • JAX NeuronX
    • JAX NeuronX plugin Setup
    • JAX NeuronX Known Issues
    • API Reference Guide for JAX Neuronx
      • JAX NeuronX Environment Variables
    • JAX NeuronX (jax-neuronx) release notes
  • TensorFlow NeuronX
    • Tensorflow Neuron Setup
    • Inference (Inf2 & Trn1)
      • Tutorials
        • HuggingFace Roberta-Base
        • Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving
      • API Reference Guide
        • TensorFlow 2.x (tensorflow-neuronx) Tracing API
        • TensorFlow 2.x (tensorflow-neuronx) Auto Multicore Replication (Beta)
        • TensorFlow 2.x (tensorflow-neuronx) analyze_model API
      • Misc
        • TensorFlow 2.x (tensorflow-neuronx) Release Notes
    • Inference (Inf1)
      • Tutorials
        • Natural Language Processing (NLP) Tutorials
        • Utilizing Neuron Capabilities Tutorials
      • Additional Examples
        • AWS Neuron Samples GitHub Repository
      • API Reference Guide
        • TensorFlow 2.x (tensorflow-neuron) Tracing API
        • TensorFlow 2.x (tensorflow-neuron) analyze_model API
        • TensorFlow 2.x (tensorflow-neuron) Auto Multicore Replication (Beta)
      • Misc
        • TensorFlow 2.x (tensorflow-neuron) Release Notes
        • TensorFlow 2.x (tensorflow-neuron) Accelerated (torch-neuron) Python APIs and Graph Ops

Training Libraries

  • NxD Training
    • Overview
    • Setup
    • App Notes
      • Introducing NxD Training
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
    • API Reference Guide
      • YAML Configuration Settings
    • Developer Guides
      • Integrating a new model
      • Integrating a new dataset/dataloader
      • Registering an optimizer and LR scheduler
      • Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training
      • NxD Training Compatibility with NeMo
      • CPU Mode Developer Guide
    • Tutorials
      • HuggingFace Llama3.1/Llama3-8B Pretraining
      • HuggingFace Llama3.1/LLama3-8B Supervised Fine-tuning
      • HuggingFace Llama3.1/Llama3-8B Efficient Supervised Fine-tuning with LoRA (Beta)
      • HuggingFace Llama3.1/Llama3-8B Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO) based Fine-tuning (Beta)
      • HuggingFace Llama3.1/Llama3-70B Pretraining
      • Checkpoint Conversion
    • Misc
      • NxD Training Release Notes (neuronx-distributed-training)
      • Known Issues and Workarounds
  • NxD Core (Training)
    • Setup
    • App Notes
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
      • Context Parallelism Overview
    • API Reference Guide
      • Distributed Strategies APIs
      • Training APIs
      • Inference APIs
      • ModelBuilderV2 API Reference
    • Developer Guide
      • Developer guide for Tensor Parallelism
      • Developer guide for Pipeline Parallelism
      • Developer guide for Activation Memory reduction
      • Developer guide for save/load checkpoint
      • Developer guide for Neuron-PT-Lightning
      • Developer guide for model and optimizer wrapper
      • Developer guide for LoRA finetuning
    • Tutorials
      • Training Tutorials
        • Training using Tensor Parallelism
        • Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1
        • Training Llama 3.1 70B/Llama 3 70B using TP and PP
        • Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
      • Inference Tutorials
        • T5 inference with Tensor Parallelism
    • Misc
      • NxD Core Release Notes (neuronx-distributed)

Inference Libraries

  • Overview
  • vLLM
    • Quickstart: Offline Model Serving
    • Quickstart: Online Model Serving
    • vLLM on Neuron User Guide
    • Deploy Llama4 with vLLM
  • NxD Inference
    • Overview
    • Setup
    • vLLM
      • Quickstart: Offline Model Serving
      • Quickstart: Online Model Serving
      • vLLM on Neuron User Guide
      • Deploy Llama4 with vLLM
    • Tutorials
      • Disaggregated Inference (1P1D)
      • Disaggregated Inference
      • Flux Inference
      • Generating Results with Performance CLI
      • GPT-OSS 120B
      • Llama3.1 405B on Trn2
      • Llama3.1 405B with Speculative Decoding
      • Llama3.1 70B Instruct Accuracy Evaluation
      • Llama3.1 8B with Multi-LoRA
      • Llama3.2 Multimodal
      • Llama3.3 70B with APC
      • Llama3.3 70B with Data Parallelism
      • Llama3.3 70B with Speculative Decoding
      • Llama4
      • Llama4 Legacy
      • Pixtral
      • Speculative Decoding
    • Developer Guides
      • Accuracy Evaluation
      • Custom Quantization
      • Disaggregated Inference
      • Feature Guide
      • Using FPEM
      • LLM Benchmarking
      • Migrate from TNX
      • Model Reference
      • MoE Architecture
      • Examples Migration
      • Onboarding Models
      • Performance Parameters
      • vLLM Guide (Legacy)
      • vLLM Guide v1
      • Weights Sharding
      • Writing Tests
    • API Reference Guide
      • NxD Inference API Reference
    • App Notes
      • Introducing NeuronX Distributed (NxD) Inference
      • Parallelism Techniques for LLM Inference
    • Models
      • Training on Trn1
      • Inference on Inf2/Trn1/Trn2
      • Inference on Inf1
    • Misc
      • NxD Inference Release Notes (neuronx-distributed-inference)
      • Troubleshooting Guide for NxD Inference
  • NxD Core (Inference)
    • Setup
    • App Notes
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
      • Context Parallelism Overview
    • API Reference Guide
      • Distributed Strategies APIs
      • Training APIs
      • Inference APIs
      • ModelBuilderV2 API Reference
    • Developer Guide
      • About NeuronX-Distributed (NxD) Inference
    • LoRA Guide
    • Tutorials
      • Training Tutorials
        • Training using Tensor Parallelism
        • Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1
        • Training Llama 3.1 70B/Llama 3 70B using TP and PP
        • Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
      • Inference Tutorials
        • T5 inference with Tensor Parallelism
    • Misc
      • NxD Core Release Notes (neuronx-distributed)

NxD Core Libraries

  • Overview
    • HF Transformers
    • NeMo Megatron

Developer Tools

  • Home
    • Third-party Tools
    • Tutorials
      • Profiling a vLLM Inference Workload on AWS Trainium
      • Profiling Multi-Node Training Jobs with Neuron Explorer
      • Profiling PyTorch NeuronX with TensorBoard
      • Track Training Progress in TensorBoard using PyTorch Neuron
      • Track System Resource Utilization during Training with neuron-monitor using PyTorch Neuron
  • Neuron Explorer
    • Get Started
    • Launch Profiles via UI, CLI, IDE
    • Device Viewer
    • Hierarcy Viewer
    • Source Code Viewer
    • Summary Viewer
    • AI Recommendation Viewer
  • Neuron Profiler 2.0
  • Neuron Profiler
  • System Tools
    • Neuron-Monitor User Guide
    • Neuron-Top User Guide
    • Neuron-LS User Guide
    • Neuron-Sysfs User Guide
    • NCCOM-TEST User Guide
    • TensorBoard
      • TensorBoard for NeuronX

Orchestrate and Deploy

  • AWS Workload Orchestration
    • Amazon EKS
      • Using Neuron with Amazon EKS
      • Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference
      • Deploy a simple mlp training script as a Kubernetes job
    • Amazon ECS
      • Neuron Problem Detector And Recovery
      • Deploy Neuron Container on Elastic Container Service (ECS) for Inference
      • Deploy Neuron Container on Elastic Container Service (ECS) for Training
    • AWS ParallelCluster
      • Parallel Cluster Flows- Training
        • Train your model on ParallelCluster
    • AWS Batch
      • Train your model on AWS Batch
    • Amazon SageMaker
    • Third-party Solutions
  • Neuron DLAMI
  • Neuron Containers
    • Quickstart: Deploy a DLC with vLLM
    • Getting started with Neuron DLC using Docker
    • Neuron Deep Learning Containers
    • Customize Neuron DLC
    • Neuron Plugins for Containerized Environments
    • How to schedule MPI jobs to run on Neuron UltraServer on EKS
    • Neuron Containers FAQ
    • Containers - Tutorials
      • Inference
        • Run Inference in PyTorch Neuron Container
        • Deploy a TensorFlow Resnet50 model as a Kubernetes service
      • Training
        • Run Training in PyTorch Neuron Container
        • Deploy a simple mlp training script as a Kubernetes job
    • DRA Beta

Runtime & Collectives

  • Neuron Runtime
    • Overview
    • Get Started
    • Deep Dives
      • Understand NEFF Files
      • Compute-Communication Overlap
      • Neuron Device Memory
      • Direct HBM Tensor Allocation
      • Runtime Performance Tips
      • Neuron Runtime Core Dumps
      • Inter-node Collectives
      • Intra-node Collectives
    • Configuration Guide
      • Runtime Configuration
    • API Reference Guide
      • Runtime API
    • Runtime API
    • NRT Debug Stream
    • Resources
      • Troubleshooting on Inf1 and Trn1
      • FAQ
      • Neuron Runtime Release Notes
      • Neuron Driver Release Notes
      • Neuron Collectives Release Notes
  • Collectives

Compilers

  • Graph Compiler
    • NeuronX Compiler for Trn1 & Inf2
      • API Reference Guide
        • Neuron Compiler CLI Reference Guide
      • Developer Guide
        • Mixed Precision and Performance-accuracy Tuning (neuronx-cc)
        • How to Use Convolution Kernels in UNet Training Models
      • Misc
        • FAQ
        • What's New
    • Neuron Compiler for Inf1
      • API Reference Guide
        • Neuron compiler CLI Reference Guide (neuron-cc)
      • Developer Guide
        • Mixed precision and performance-accuracy tuning (neuron-cc)
      • Misc
        • FAQ
        • What's New
        • Neuron Supported operators
    • Error codes
      • NCC_EARG001
      • NCC_EBVF030
      • NCC_EHCA005
      • NCC_EOOM001
      • NCC_EOOM002
      • NCC_ESFH002
      • NCC_ESPP004
      • NCC_ESPP047
      • NCC_EUOC002
      • NCC_EVRF001
      • NCC_EVRF004
      • NCC_EVRF005
      • NCC_EVRF006
      • NCC_EVRF007
      • NCC_EVRF009
      • NCC_EVRF010
      • NCC_EVRF011
      • NCC_EVRF013
      • NCC_EVRF015
      • NCC_EVRF016
      • NCC_EVRF017
      • NCC_EVRF018
      • NCC_EVRF019
      • NCC_EVRF022
      • NCC_EVRF024
      • NCC_EVRF031
      • NCC_EXSP001
      • NCC_EXTP004
  • NKI Compiler
    • About the NKI Compiler
    • Graph Compiler Integration
  • Neuron C++ Custom Operators
    • API Reference Guide
      • Custom Operators API Reference Guide [Beta]
    • Developer Guide
      • Neuron Custom C++ Operators Developer Guide [Beta]
    • Tutorials
      • Neuron Custom C++ Operators in MLP Training
      • Neuron Custom C++ Operators Performance Optimization
    • Misc (Neuron Custom C++ Operators)
      • Neuron Custom C++ Tools Release Notes
      • Neuron Custom C++ Library Release Notes

Neuron Kernel Interface (NKI)

  • Home
    • NKI Release Notes
  • Concepts
    • Data Representation
    • Direct Memory Access
    • Indexing
    • Memory Hierarchy
    • Tiling
    • Trainium/Inferentia2 Architecture
    • Trainium2 Architecture
    • Trainium3 Architecture
    • NKI Beta Features
    • Known Issues
    • FAQ
  • NKI Setup
    • Get Started with NKI (legacy document)
  • Quickstart: Build and Run a Kernel
  • How-To Guides
    • Introduction to NKI Kernel Optimization
    • NKI Kernel as a Framework Custom Operator
    • How to Profile a NKI Kernel
    • Profiling NKI kernels with Neuron Profile (Legacy)
    • NKI Performance Guide
    • NKI Direct Allocation Developer Guide
    • NKI Block Dimension Migration Guide
  • Tutorials
    • Matrix multiplication
    • LayerNorm
    • RMSNorm
    • AveragePool2D
    • Transpose2D
    • Fused Self Attention
    • Fused Mamba
    • SPMD Tensor Addition
    • Multi-core SPMD Addition
  • Deep Dives
    • NKI Language Guide (Beta 2)
    • NKI Programming Model (Legacy)
  • API Reference
    • nki
      • nki.jit
    • nki.isa
      • nki.isa.nc_matmul
      • nki.isa.nc_matmul_mx
      • nki.isa.nc_transpose
      • nki.isa.activation
      • nki.isa.activation_reduce
      • nki.isa.tensor_reduce
      • nki.isa.tensor_partition_reduce
      • nki.isa.tensor_tensor
      • nki.isa.tensor_tensor_scan
      • nki.isa.scalar_tensor_tensor
      • nki.isa.tensor_scalar
      • nki.isa.tensor_scalar_reduce
      • nki.isa.tensor_copy
      • nki.isa.tensor_copy_dynamic_src
      • nki.isa.tensor_copy_dynamic_dst
      • nki.isa.tensor_copy_predicated
      • nki.isa.reciprocal
      • nki.isa.quantize_mx
      • nki.isa.iota
      • nki.isa.dropout
      • nki.isa.affine_select
      • nki.isa.range_select
      • nki.isa.select_reduce
      • nki.isa.sequence_bounds
      • nki.isa.memset
      • nki.isa.bn_stats
      • nki.isa.bn_aggr
      • nki.isa.local_gather
      • nki.isa.dma_copy
      • nki.isa.dma_transpose
      • nki.isa.dma_compute
      • nki.isa.max8
      • nki.isa.nc_find_index8
      • nki.isa.nc_match_replace8
      • nki.isa.nc_stream_shuffle
      • nki.isa.register_alloc
      • nki.isa.register_load
      • nki.isa.register_move
      • nki.isa.register_store
      • nki.isa.core_barrier
      • nki.isa.sendrecv
      • nki.isa.engine
      • nki.isa.reduce_cmd
      • nki.isa.dge_mode
      • nki.isa.nc_version
      • nki.isa.get_nc_version
    • nki.language
      • nki.language.ndarray
      • nki.language.zeros
      • nki.language.ds
      • nki.language.static_range
      • nki.language.affine_range
      • nki.language.sequential_range
      • nki.language.psum
      • nki.language.sbuf
      • nki.language.hbm
      • nki.language.private_hbm
      • nki.language.shared_hbm
      • nki.language.program_id
      • nki.language.num_programs
      • nki.language.program_ndim
      • nki.language.bool_
      • nki.language.uint8
      • nki.language.uint16
      • nki.language.uint32
      • nki.language.int8
      • nki.language.int16
      • nki.language.int32
      • nki.language.float4_e2m1fn_x4
      • nki.language.float8_e4m3
      • nki.language.float8_e4m3fn_x4
      • nki.language.float8_e5m2
      • nki.language.float8_e5m2_x4
      • nki.language.float16
      • nki.language.bfloat16
      • nki.language.float32
      • nki.language.tfloat32
      • nki.language.tile_size
    • NKI API Common Fields
  • NKI Library
    • Overview
    • Tutorial: Use a NKI Library Kernel
    • Kernel Design Specs
      • RMSNorm-Quant
    • Kernel API Reference
      • RMSNorm-Quant
      • QKV
      • Attention CTE
      • Attention TKG
      • MLP
      • Output Projection CTE
      • Output Projection TKG

Other Content

  • Release Notes
    • Neuron 2.26.1
    • Previous versions
  • Archived content
    • Fine-tune T5 model on Trn1
    • Running SSD300 with AWS Neuron
    • Megatron GPT Pretraining
    • Training GPT-NeoX 20B with Tensor Parallelism and ZeRO-1 Optimizer
    • Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning
    • Training Llama-2-7B/13B/70B using Tensor Parallelism and Pipeline Parallelism with Neuron PyTorch-Lightning
    • Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer
    • Training GPT-NeoX 6.9B with Tensor Parallelism and ZeRO-1 Optimizer
    • Neuron Plugin for TensorBoard (Inf1)
    • NeuronPerf (Beta)
      • Overview
      • Terminology
      • Examples
      • Benchmark Guide
      • Evaluate Guide
      • Compile Guide
      • Model Index Guide
      • API
      • Framework Notes
      • FAQ
      • Troubleshooting
      • What’s New
        • NeuronPerf 1.x Release Notes
    • Helper Tools
      • Check Model
      • GatherInfo
    • Transformers NeuronX (transformers-neuronx)
      • Setup
      • Developer Guide
        • Transformers NeuronX (transformers-neuronx) Developer Guide
        • Transformers NeuronX (transformers-neuronx) Developer Guide for Continuous Batching
      • Tutorials
        • Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-66b autoregressive sampling on Inf2
      • Misc
        • Transformers Neuron (transformers-neuronx) release notes
  • Repository
  • Suggest edit
  • Open issue
  • .rst

Helper Tools

Helper Tools#

  • Check Model
  • GatherInfo

previous

NeuronPerf 1.x Release Notes

next

Neuron Check Model

By AWS

© Copyright 2025, Amazon.com.