Skip to main content
Ctrl+K
Neuron 2.28.0 is released! Check the What's New and Release Notes for more details.
Logo image
Ctrl+K
Search Engine: Default Google
  • About Neuron
    • App Notes
      • Neuron Runtime Library
      • Performance
      • Parallel execution
      • PyTorch for Neuron
        • Running inference on variable input shapes with bucketing
        • Running R-CNNs on Inf1
        • Data Parallel Inference on Torch Neuron
      • PyTorch for NeuronX
        • Introducing PyTorch 2.6 Support
        • Introducing PyTorch 2.7 Support
        • Introducing PyTorch 2.8 Support
        • Introducing PyTorch 2.9 Support
        • Introducing PyTorch 2.5 Support
        • Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16
        • Data Parallel Inference on torch_neuronx
        • Graph Partitioner on torch_neuronx
    • Ask Amazon Q
    • Benchmarks
      • Inf1 Inference Performance
      • Inf2 Inference Performance
      • Trn1/Trn1n Inference Performance
      • Trn1/Trn1n Training Performance
    • Beta Participation
    • Model Samples
      • Training on Trn1
      • Inference on Inf2/Trn1/Trn2
      • Inference on Inf1
    • Neuron FAQ
    • Neuron Features
      • Custom C++ operators
      • Data types
      • Logical NeuronCore configuration
      • Neuron persistent cache
      • NeuronCore batching
      • NeuronCore pipeline
      • Rounding modes
    • Open Source
    • SDK Maintenance Policy
    • Security
    • Term Glossary
    • Troubleshooting
    • What is AWS Neuron?
    • Older Neuron FAQS
      • Neuron 2.x Introduction FAQ
      • ONNX FAQ
      • Contributing Guidelines FAQ
      • Inference with Neuron FAQ
      • Troubleshooting for Inf1 FAQ
      • Training with Neuron FAQ
  • Neuron Architecture
    • AWS Inferentia
    • AWS Inferentia2
    • AWS Trainium
    • AWS Trainium2
    • AWS Trainium3
    • NeuronCore v1
    • NeuronCore v2
    • NeuronCore v3
    • NeuronCore v4
    • Inf1 Architecture
    • Inf2 Architecture
    • Trn1 Architecture
    • Trn2 Architecture
    • Trn3 Architecture
  • What's New
    • Release Notes
      • Neuron 2.28.0
        • PyTorch (torch-neuronx)
        • NxD Inference/vLLM
        • NKI
        • NKI Library
        • Neuron Runtime
        • Developer tools
        • Deep Learning AMIs
        • Deep Learning Containers
      • Component release notes
        • Neuron Compiler
        • Neuron Containers
        • Neuron Developer Tools
        • Neuron DLAMI
        • JAX NeuronX
        • NKI Library
        • Neuron Kernel Interface
        • NxD Core
        • NxD Inference
        • NxD Training
        • PyTorch Neuron Framework
        • Neuron Runtime
        • Older components and features
      • Release artifacts
      • Previous versions
        • Neuron 2.27.1
        • Neuron 2.27.0
        • Neuron 2.26.1
        • Neuron 2.26.0
        • Neuron 2.25.0
        • Component release notes
  • Announcements
  • News & Blogs
  • Contribute

Get Started

  • Quickstarts
    • PyTorch Neuron (torch-neuronx) Setup
      • PyTorch NeuronX on Multi-Framework DLAMI (Ubuntu 24)
      • PyTorch NeuronX on Ubuntu 24
      • PyTorch NeuronX on Multi-Framework DLAMI (Ubuntu 22)
      • PyTorch NeuronX on Ubuntu 22
      • PyTorch NeuronX on Amazon Linux 2023
      • PyTorch NeuronX on Rocky Linux 9
    • JAX Support on Neuron
      • JAX NeuronX plugin Setup
      • JAX NeuronX Known Issues
      • API Reference Guide for JAX Neuronx
        • JAX NeuronX Environment Variables
      • Release Notes
    • Quickstart: Serve models online with vLLM on Neuron
    • Quickstart: Run offline inference with vLLM on Neuron
    • Ask Q Developer
  • Setup Guides
    • Launching Inf/Trn instances on Amazon EC2
      • Inference
        • Compile with Framework API and Deploy on EC2 Inf1
        • Compile with Framework API and Deploy on EC2 Inf2
      • Training
        • Train your model on EC2
    • PyTorch NeuronX (torch-neuronx)
      • PyTorch NeuronX on Multi-Framework DLAMI (Ubuntu 24)
      • PyTorch NeuronX on Ubuntu 24
      • PyTorch NeuronX on Multi-Framework DLAMI (Ubuntu 22)
      • PyTorch NeuronX on Ubuntu 22
      • PyTorch NeuronX on Amazon Linux 2023
      • PyTorch NeuronX on Rocky Linux 9
    • PyTorch Neuron (torch-neuron)
      • PyTorch Neuron on Ubuntu 20
      • PyTorch Neuron on DLAMI Base (Ubuntu 20)
      • PyTorch Neuron on DLAMI PyTorch (Ubuntu 20)
      • PyTorch Neuron on Multi-Framework DLAMI (Ubuntu 22)
      • PyTorch Neuron on Ubuntu 22
      • PyTorch Neuron on Amazon Linux 2023
      • PyTorch Neuron on Rocky Linux 9
    • JAX NeuronX
      • JAX NeuronX plugin Setup
      • JAX NeuronX Known Issues
      • API Reference Guide for JAX Neuronx
        • JAX NeuronX Environment Variables
      • Release Notes
    • Tensorflow NeuronX (tensorflow-neuronx)
      • TensorFlow NeuronX on Multi-Framework DLAMI (Ubuntu 22)
      • TensorFlow NeuronX on Ubuntu 22
      • TensorFlow NeuronX on Amazon Linux 2023
    • Tensorflow Neuron (tensorflow-neuron)
      • TensorFlow Neuron on Ubuntu 20
      • TensorFlow Neuron on DLAMI Base (Ubuntu 20)
      • TensorFlow Neuron on Multi-Framework DLAMI (Ubuntu 22)
      • TensorFlow Neuron on Ubuntu 22
      • TensorFlow Neuron on Amazon Linux 2023
    • MxNet Neuron (mxnet-neuron)
      • MXNet Neuron on Ubuntu 20
      • MXNet Neuron on DLAMI Base (Ubuntu 20)
      • MXNet Neuron on Ubuntu 22
      • MXNet Neuron on Amazon Linux 2023
    • Troubleshooting
  • Developer Flows
    • Amazon EKS
      • Using Neuron with Amazon EKS
      • Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference
      • Deploy a simple mlp training script as a Kubernetes job
    • Amazon ECS
      • Neuron Problem Detector And Recovery
      • Deploy Neuron Container on Elastic Container Service (ECS) for Inference
      • Deploy Neuron Container on Elastic Container Service (ECS) for Training
    • AWS ParallelCluster
      • Parallel Cluster Flows- Training
        • Train your model on ParallelCluster
    • AWS Batch
      • Train your model on AWS Batch
    • Amazon SageMaker
    • Third-party Solutions

ML Frameworks

  • Home
  • Native PyTorch
  • PyTorch NeuronX
    • Pytorch Neuron Setup
    • Native PyTorch for AWS Trainium
    • Training
      • Tutorials
        • Hugging Face BERT Pretraining Tutorial (Data-Parallel)
        • Multi-Layer Perceptron Training Tutorial
        • PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API
        • ZeRO-1 Tutorial
        • Analyze for Training Tutorial
        • Neuron Custom C++ Operators in MLP Training
        • Neuron Custom C++ Operators Performance Optimization
      • Additional Examples
        • AWS Neuron Reference for Nemo Megatron GitHub Repository
        • AWS Neuron Samples for EKS
        • AWS Neuron Samples for AWS ParallelCluster
        • AWS Neuron Samples GitHub Repository
      • API Reference Guide
        • PyTorch NeuronX neuron_parallel_compile CLI
        • PyTorch NeuronX Environment Variables
        • Neuron Persistent Cache
        • PyTorch NeuronX Profiling API
      • Developer Guide
        • Developer Guide for Training with PyTorch NeuronX
        • How to debug models in PyTorch NeuronX
        • Developer Guide for Profiling with PyTorch NeuronX
      • Misc
        • PyTorch Neuron (torch-neuronx) - Supported Operators
        • How to prepare trn1.32xlarge for multi-node execution
        • PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
        • Component Release Notes for Neuron PyTorch Framework
    • Inference (Inf2, Trn1, Trn2)
      • Tutorials
        • Compiling and Deploying HuggingFace Pretrained BERT on Trn1 or Inf2
        • BERT TorchServe Tutorial
        • LibTorch C++ Tutorial
        • Compiling and Deploying ResNet50 on Trn1 or Inf2
        • T5 model inference on Trn1 or Inf2
      • Additional Examples
        • AWS Neuron Samples GitHub Repository
        • Transformers Neuron GitHub samples
      • API Reference Guide
        • PyTorch NeuronX Tracing API for Inference
        • PyTorch Neuron (torch-neuronx) Weight Replacement API for Inference
        • PyTorch NeuronX NeuronCore Placement APIs
        • PyTorch NeuronX Analyze API for Inference
        • PyTorch NeuronX DataParallel API
      • Developer Guide
        • NeuronCore Allocation and Model Placement for Inference (torch-neuronx)
        • Comparison of Traced Inference versus XLA Lazy Tensor Inference (torch-neuronx)
        • Data Parallel Inference on torch_neuronx
      • Misc
        • Component Release Notes for Neuron PyTorch Framework
    • Inference (Inf1)
      • Tutorials
        • Computer Vision Tutorials
        • Natural Language Processing (NLP) Tutorials
        • Utilizing Neuron Capabilities Tutorials
      • Additional Examples
        • AWS Neuron Samples GitHub Repository
      • API Reference Guide
        • PyTorch Neuron trace Python API
        • torch.neuron.DataParallel API
        • PyTorch Neuron (torch-neuron) Core Placement API
      • Developer Guide
        • Running Inference on Variable Input Shapes with Bucketing
        • Data Parallel Inference on PyTorch Neuron
        • Developer Guide - PyTorch Neuron (torch-neuron) LSTM Support
        • PyTorch Neuron (torch-neuron) Core Placement
      • Misc
        • PyTorch Neuron (torch-neuron) Supported operators
        • Troubleshooting Guide for PyTorch Neuron (torch-neuron)
        • Component Release Notes for Neuron PyTorch Framework
    • Release Notes
  • JAX NeuronX
    • JAX NeuronX plugin Setup
    • JAX NeuronX Known Issues
    • API Reference Guide for JAX Neuronx
      • JAX NeuronX Environment Variables
    • Release Notes

Training

  • NxD Training
    • Overview
    • Setup
    • Tutorials
      • HuggingFace Llama3.1/Llama3-8B Pretraining
      • HuggingFace Llama3.1/LLama3-8B Supervised Fine-tuning
      • HuggingFace Llama3.1/Llama3-8B Efficient Supervised Fine-tuning with LoRA (Beta)
      • HuggingFace Llama3.1/Llama3-8B Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO) based Fine-tuning (Beta)
      • HuggingFace Llama3.1/Llama3-70B Pretraining
      • Checkpoint Conversion
    • Developer Guides
      • Integrating a new model
      • Integrating a new dataset/dataloader
      • Registering an optimizer and LR scheduler
      • Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training
      • NxD Training Compatibility with NeMo
      • CPU Mode Developer Guide
    • API Reference Guide
      • YAML Configuration Settings
    • App Notes
      • Introducing NxD Training
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
    • Release Notes
    • Misc
      • Component Release Notes for NxD Training
      • Known Issues and Workarounds
  • NxD Core (Training)
    • Setup
    • App Notes
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
      • Context Parallelism Overview
    • API Reference Guide
      • Distributed Strategies APIs
      • Training APIs
      • Inference APIs
      • ModelBuilderV2 API Reference
    • Developer Guide
      • Developer guide for Tensor Parallelism
      • Developer guide for Pipeline Parallelism
      • Developer guide for Activation Memory reduction
      • Developer guide for save/load checkpoint
      • Developer guide for Neuron-PT-Lightning
      • Developer guide for model and optimizer wrapper
      • Developer guide for LoRA finetuning
    • Tutorials
      • Training Tutorials
        • Training using Tensor Parallelism
        • Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1
        • Training Llama 3.1 70B/Llama 3 70B using TP and PP
        • Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
      • Inference Tutorials
        • [Broken] T5 inference with Tensor Parallelism
    • Misc
      • Component Release Notes for NxD Core

Inference

  • Overview
  • vLLM
    • Quickstart: Offline Model Serving
    • Quickstart: Online Model Serving
    • vLLM on Neuron User Guide
    • Model Recipes
      • Llama 3.3 70B
      • Qwen3 235B A22B
    • Deploy Llama4 with vLLM
  • NxD Inference
    • Overview
    • Setup
    • Tutorials
      • Disaggregated Inference (1P1D)
      • Disaggregated Inference
      • Flux Inference
      • Flux Inpainting
      • Benchmark using Performance CLI
      • GPT-OSS 120B
      • Llama3.1 405B on Trn2
      • Llama3.1 405B with Speculative Decoding
      • Llama3.1 70B Instruct Accuracy Evaluation
      • Llama3.1 8B with Multi-LoRA
      • Llama3.3 70B with APC
      • Llama3.3 70B with Data Parallelism
      • Llama3.3 70B with Speculative Decoding
      • Llama4
      • Llama4 Legacy
      • Pixtral
      • Qwen3 MoE 235B
      • Qwen2 VL 7B
      • Speculative Decoding
      • Qwen3-VL 8B
    • Developer Guides
      • Accuracy Evaluation
      • Custom Quantization
      • Disaggregated Inference
      • Feature Guide
      • Using Pipeline Execution Mode
      • LLM Benchmarking
      • Migrate from TNX
      • Model Reference
      • MoE Architecture
      • Migrate from NxD Core
      • Onboarding Models
      • Performance Benchmarking CLI
      • vLLM Guide (Legacy)
      • vLLM Guide v1
      • Weights Sharding
      • Writing Tests
    • API Reference Guide
      • NxD Inference API Reference
    • App Notes
      • Introducing NeuronX Distributed (NxD) Inference
      • Parallelism Techniques for LLM Inference
    • Release Notes
    • Misc
      • Component Release Notes for NxD Inference
      • Troubleshooting Guide for NxD Inference
  • NxD Core (Inference)
    • Setup
    • App Notes
      • Tensor Parallelism Overview
      • Pipeline Parallelism Overview
      • Activation Memory Reduction
      • Context Parallelism Overview
    • API Reference Guide
      • Distributed Strategies APIs
      • Training APIs
      • Inference APIs
      • ModelBuilderV2 API Reference
    • Developer Guide
      • About NeuronX-Distributed (NxD) Inference
    • LoRA Guide
    • Tutorials
      • Training Tutorials
        • Training using Tensor Parallelism
        • Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1
        • Training Llama 3.1 70B/Llama 3 70B using TP and PP
        • Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
      • Inference Tutorials
        • [Broken] T5 inference with Tensor Parallelism
    • Misc
      • Component Release Notes for NxD Core

Developer Tools

  • Home
    • Third-party Tools
    • Tutorials
      • Profiling a vLLM Inference Workload on AWS Trainium
      • Profiling Multi-Node Training Jobs with Neuron Explorer
      • Profiling PyTorch NeuronX with TensorBoard
      • Track Training Progress in TensorBoard using PyTorch Neuron
      • Track System Resource Utilization during Training with neuron-monitor using PyTorch Neuron
    • Release Notes
  • Neuron Explorer
    • Get Started
    • Neuron Profiler to Neuron Explorer Migration Guide
    • Capture and View Profiles
    • Device Trace Viewer
    • System Trace Viewer
    • Hierarchy Viewer
    • Source Code Viewer
    • Summary Viewer
    • Database Viewer
    • Tensor Viewer
    • AI Recommendation Viewer
    • View Profiles with Perfetto
  • Neuron Profiler 2.0
  • Neuron Profiler
  • System Tools
    • Neuron-Monitor User Guide
    • Neuron-Top User Guide
    • Neuron-LS User Guide
    • Neuron-Sysfs User Guide
    • NCCOM-TEST User Guide
    • TensorBoard
      • TensorBoard for NeuronX

Orchestrate and Deploy

  • AWS Workload Orchestration
    • Amazon EKS
      • Using Neuron with Amazon EKS
      • Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference
      • Deploy a simple mlp training script as a Kubernetes job
    • Amazon ECS
      • Neuron Problem Detector And Recovery
      • Deploy Neuron Container on Elastic Container Service (ECS) for Inference
      • Deploy Neuron Container on Elastic Container Service (ECS) for Training
    • AWS ParallelCluster
      • Parallel Cluster Flows- Training
        • Train your model on ParallelCluster
    • AWS Batch
      • Train your model on AWS Batch
    • Amazon SageMaker
    • Third-party Solutions
  • Neuron DLAMI
  • Neuron Containers
    • Getting Started
    • Locate Neuron DLC Images
    • Customize DLC
    • Neuron Plugins
    • Tutorials
      • Inference
        • Run Inference in PyTorch Neuron Container
        • Deploy a TensorFlow Resnet50 model as a Kubernetes service
      • Training
        • Run Training in PyTorch Neuron Container
        • Deploy a simple mlp training script as a Kubernetes job
      • Tutorial Docker environment setup
      • Tutorial How to Build and Run a Neuron Container
      • Tutorial Docker Neuron OCI Hook Setup
      • Kubernetes environment setup for Neuron
      • Components Included
      • Neuron Scheduler Extension Flow Diagram
      • Deploy Neuron Monitor DaemonSet
      • Deploy Neuron Node Problem Detector and Recovery
      • Permissions for Neuron Node Problem Detector and Recovery
    • How-To Guides
    • FAQ
    • DRA
      • Support Files
    • Release Notes

Runtime & Collectives

  • Neuron Runtime
    • Overview
    • Get Started
    • Deep Dives
      • Understand NEFF Files
      • Compute-Communication Overlap
      • Neuron Device Memory
      • Direct HBM Tensor Allocation
      • Runtime Performance Tips
      • Neuron Runtime Core Dumps
      • Inter-node Collectives
      • Intra-node Collectives
    • Configuration Guide
      • Runtime Configuration
    • Developer Guide
    • API Reference
      • Debug Stream APIs
      • NRT Async
      • NRT Async Send/Recv
      • NRT Profile
      • NRT System Trace
      • Debug Stream
      • NEC API
      • NDL API
      • Neuron Driver Shared
      • Tensor Batch Operations
      • Neuron Datastore
      • NRT Experimental
    • NRT Debug Stream
    • Troubleshooting on Inf1 and Trn1
    • Release Notes
    • FAQ
  • Collectives
  • Neuron C++ Custom Operators
    • API Reference Guide
      • Custom Operators API Reference Guide [Beta]
    • Developer Guide
      • Neuron Custom C++ Operators Developer Guide [Beta]
    • Tutorials
      • Neuron Custom C++ Operators in MLP Training
      • Neuron Custom C++ Operators Performance Optimization
    • Misc (Neuron Custom C++ Operators)
      • Neuron Custom C++ Tools Release Notes
      • Neuron Custom C++ Library Release Notes

Compilers

  • Graph Compiler
    • NeuronX Compiler for Trn1 & Inf2
      • API Reference Guide
      • How-to: Convolution
        • * `UNet training sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/unet_image_segmentation>`_ - Sample UNet training implementation
      • Developer Guide
        • Mixed Precision and Performance-accuracy Tuning (neuronx-cc)
        • How to Use Convolution Kernels in UNet Training Models
      • FAQ
    • Neuron Compiler for Inf1
      • API Reference Guide
        • Neuron compiler CLI Reference Guide (neuron-cc)
      • CLI Reference
      • Developer Guide
        • Mixed precision and performance-accuracy tuning (neuron-cc)
      • FAQ
    • Error Codes
      • NCC_EARG001
      • NCC_EBVF030
      • NCC_EHCA005
      • NCC_EOOM001
      • NCC_EOOM002
      • NCC_ESFH002
      • NCC_ESPP004
      • NCC_ESPP047
      • NCC_EUOC002
      • NCC_EVRF001
      • NCC_EVRF004
      • NCC_EVRF005
      • NCC_EVRF006
      • NCC_EVRF007
      • NCC_EVRF009
      • NCC_EVRF010
      • NCC_EVRF011
      • NCC_EVRF013
      • NCC_EVRF015
      • NCC_EVRF016
      • NCC_EVRF017
      • NCC_EVRF018
      • NCC_EVRF019
      • NCC_EVRF022
      • NCC_EVRF024
      • NCC_EVRF031
      • NCC_EXSP001
      • NCC_EXTP004
    • Release Notes
  • Compiler Error Codes
    • NCC_EARG001
    • NCC_EBVF030
    • NCC_EHCA005
    • NCC_EOOM001
    • NCC_EOOM002
    • NCC_ESFH002
    • NCC_ESPP004
    • NCC_ESPP047
    • NCC_EUOC002
    • NCC_EVRF001
    • NCC_EVRF004
    • NCC_EVRF005
    • NCC_EVRF006
    • NCC_EVRF007
    • NCC_EVRF009
    • NCC_EVRF010
    • NCC_EVRF011
    • NCC_EVRF013
    • NCC_EVRF015
    • NCC_EVRF016
    • NCC_EVRF017
    • NCC_EVRF018
    • NCC_EVRF019
    • NCC_EVRF022
    • NCC_EVRF024
    • NCC_EVRF031
    • NCC_EXSP001
    • NCC_EXTP004

Neuron Kernel Interface (NKI)

  • Home
    • API Reference Manual
      • nki.jit Decorator Reference
        • nki.jit
      • nki.isa
        • nki.isa.nc_matmul
        • nki.isa.nc_matmul_mx
        • nki.isa.nc_transpose
        • nki.isa.activation
        • nki.isa.activation_reduce
        • nki.isa.tensor_reduce
        • nki.isa.tensor_partition_reduce
        • nki.isa.tensor_tensor
        • nki.isa.tensor_tensor_scan
        • nki.isa.scalar_tensor_tensor
        • nki.isa.tensor_scalar
        • nki.isa.tensor_scalar_reduce
        • nki.isa.tensor_scalar_cumulative
        • nki.isa.tensor_copy
        • nki.isa.tensor_copy_predicated
        • nki.isa.exponential
        • nki.isa.reciprocal
        • nki.isa.quantize_mx
        • nki.isa.iota
        • nki.isa.dropout
        • nki.isa.affine_select
        • nki.isa.range_select
        • nki.isa.select_reduce
        • nki.isa.sequence_bounds
        • nki.isa.memset
        • nki.isa.bn_stats
        • nki.isa.bn_aggr
        • nki.isa.local_gather
        • nki.isa.nc_n_gather
        • nki.isa.dma_copy
        • nki.isa.dma_transpose
        • nki.isa.dma_compute
        • nki.isa.max8
        • nki.isa.nc_find_index8
        • nki.isa.nc_match_replace8
        • nki.isa.nc_stream_shuffle
        • nki.isa.register_alloc
        • nki.isa.register_load
        • nki.isa.register_move
        • nki.isa.register_store
        • nki.isa.core_barrier
        • nki.isa.sendrecv
        • nki.isa.rng
        • nki.isa.rand2
        • nki.isa.rand_set_state
        • nki.isa.rand_get_state
        • nki.isa.set_rng_seed
        • nki.isa.nonzero_with_count
        • nki.isa.engine
        • nki.isa.reduce_cmd
        • nki.isa.dge_mode
        • nki.isa.oob_mode
        • nki.isa.nc_version
        • nki.isa.get_nc_version
      • nki.language
        • nki.language.ndarray
        • nki.language.zeros
        • nki.language.shared_constant
        • nki.language.ds
        • nki.language.static_range
        • nki.language.affine_range
        • nki.language.sequential_range
        • nki.language.psum
        • nki.language.sbuf
        • nki.language.hbm
        • nki.language.private_hbm
        • nki.language.shared_hbm
        • nki.language.program_id
        • nki.language.num_programs
        • nki.language.program_ndim
        • nki.language.device_print
        • nki.language.bool_
        • nki.language.int8
        • nki.language.int16
        • nki.language.int32
        • nki.language.uint8
        • nki.language.uint16
        • nki.language.uint32
        • nki.language.float16
        • nki.language.float32
        • nki.language.bfloat16
        • nki.language.tfloat32
        • nki.language.float8_e4m3
        • nki.language.float8_e5m2
        • nki.language.float8_e4m3fn
        • nki.language.float8_e5m2_x4
        • nki.language.float8_e4m3fn_x4
        • nki.language.float4_e2m1fn_x4
        • nki.language.tile_size
      • nki.collectives
        • nki.collectives.all_reduce
        • nki.collectives.all_gather
        • nki.collectives.reduce_scatter
        • nki.collectives.all_to_all
        • nki.collectives.collective_permute
        • nki.collectives.collective_permute_implicit
        • nki.collectives.collective_permute_implicit_reduce
        • nki.collectives.collective_permute_implicit_current_processing_rank_id
        • nki.collectives.rank_id
        • nki.collectives.ReplicaGroup
      • NKI API Common Fields
  • Get Started
    • Environment Setup
    • First NKI Kernel
    • NKI Language Guide
    • Concepts
      • Memory Hierarchy
      • Data Representation
      • Indexing
      • Tiling
      • Direct Memory Access
      • Logical Neuron Cores
  • Guides
    • Tutorials
      • Matrix Multiplication
      • AveragePool2D
      • Transpose2D
      • Fused Mamba
      • Introduction to NKI Kernel Optimization
      • SPMD Tensor Addition
      • Multi-Core SPMD Addition
    • Architecture
      • Trainium/Inferentia2 Guide
      • Trainium2 Guide
      • Trainium3 Guide
    • Insert NKI Kernels into Models
    • Use NKI Scheduling APIs
  • Deep Dives
    • Profile a NKI Kernel
    • Performance Optimizations
    • MXFP8/4 Matrix Multiplication
    • Migrating Kernels to NKI Beta 2
    • NKI Access Patterns
    • Block Dimension Migration Guide
    • About the NKI Compiler
    • About NKI Beta Versions
  • NKI API Reference
    • nki.jit Decorator Reference
      • nki.jit
    • nki.isa
      • nki.isa.nc_matmul
      • nki.isa.nc_matmul_mx
      • nki.isa.nc_transpose
      • nki.isa.activation
      • nki.isa.activation_reduce
      • nki.isa.tensor_reduce
      • nki.isa.tensor_partition_reduce
      • nki.isa.tensor_tensor
      • nki.isa.tensor_tensor_scan
      • nki.isa.scalar_tensor_tensor
      • nki.isa.tensor_scalar
      • nki.isa.tensor_scalar_reduce
      • nki.isa.tensor_scalar_cumulative
      • nki.isa.tensor_copy
      • nki.isa.tensor_copy_predicated
      • nki.isa.exponential
      • nki.isa.reciprocal
      • nki.isa.quantize_mx
      • nki.isa.iota
      • nki.isa.dropout
      • nki.isa.affine_select
      • nki.isa.range_select
      • nki.isa.select_reduce
      • nki.isa.sequence_bounds
      • nki.isa.memset
      • nki.isa.bn_stats
      • nki.isa.bn_aggr
      • nki.isa.local_gather
      • nki.isa.nc_n_gather
      • nki.isa.dma_copy
      • nki.isa.dma_transpose
      • nki.isa.dma_compute
      • nki.isa.max8
      • nki.isa.nc_find_index8
      • nki.isa.nc_match_replace8
      • nki.isa.nc_stream_shuffle
      • nki.isa.register_alloc
      • nki.isa.register_load
      • nki.isa.register_move
      • nki.isa.register_store
      • nki.isa.core_barrier
      • nki.isa.sendrecv
      • nki.isa.rng
      • nki.isa.rand2
      • nki.isa.rand_set_state
      • nki.isa.rand_get_state
      • nki.isa.set_rng_seed
      • nki.isa.nonzero_with_count
      • nki.isa.engine
      • nki.isa.reduce_cmd
      • nki.isa.dge_mode
      • nki.isa.oob_mode
      • nki.isa.nc_version
      • nki.isa.get_nc_version
    • nki.language
      • nki.language.ndarray
      • nki.language.zeros
      • nki.language.shared_constant
      • nki.language.ds
      • nki.language.static_range
      • nki.language.affine_range
      • nki.language.sequential_range
      • nki.language.psum
      • nki.language.sbuf
      • nki.language.hbm
      • nki.language.private_hbm
      • nki.language.shared_hbm
      • nki.language.program_id
      • nki.language.num_programs
      • nki.language.program_ndim
      • nki.language.device_print
      • nki.language.bool_
      • nki.language.int8
      • nki.language.int16
      • nki.language.int32
      • nki.language.uint8
      • nki.language.uint16
      • nki.language.uint32
      • nki.language.float16
      • nki.language.float32
      • nki.language.bfloat16
      • nki.language.tfloat32
      • nki.language.float8_e4m3
      • nki.language.float8_e5m2
      • nki.language.float8_e4m3fn
      • nki.language.float8_e5m2_x4
      • nki.language.float8_e4m3fn_x4
      • nki.language.float4_e2m1fn_x4
      • nki.language.tile_size
    • nki.collectives
      • nki.collectives.all_reduce
      • nki.collectives.all_gather
      • nki.collectives.reduce_scatter
      • nki.collectives.all_to_all
      • nki.collectives.collective_permute
      • nki.collectives.collective_permute_implicit
      • nki.collectives.collective_permute_implicit_reduce
      • nki.collectives.collective_permute_implicit_current_processing_rank_id
      • nki.collectives.rank_id
      • nki.collectives.ReplicaGroup
    • NKI API Common Fields
  • NKI Library
    • Overview
    • Kernel Design Specs
      • RMSNorm-Quant
    • Kernel API Reference
      • Attention Block TKG
      • Attention CTE
      • Attention TKG
      • Blockwise MM Backward
      • Cross Entropy
      • Cumsum
      • Depthwise Conv1D
      • MLP
      • MoE CTE
      • MoE TKG
      • Output Projection CTE
      • Output Projection TKG
      • QKV
      • RMSNorm-Quant
      • RoPE
      • Router Top-K
    • Kernel Utilities
      • SbufManager (Allocator)
      • TensorView
    • Release Notes

Archive

  • Archived content
    • Fine-tune T5 model on Trn1
    • Running SSD300 with AWS Neuron
    • Megatron GPT Pretraining
    • Training GPT-NeoX 20B with Tensor Parallelism and ZeRO-1 Optimizer
    • Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning
    • Training Llama-2-7B/13B/70B using Tensor Parallelism and Pipeline Parallelism with Neuron PyTorch-Lightning
    • Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer
    • Training GPT-NeoX 6.9B with Tensor Parallelism and ZeRO-1 Optimizer
    • Neuron Plugin for TensorBoard (Inf1)
    • NeuronPerf (Beta)
      • Overview
      • Terminology
      • Examples
      • Benchmark Guide
      • Evaluate Guide
      • Compile Guide
      • Model Index Guide
      • API
      • Framework Notes
      • FAQ
      • Troubleshooting
      • What’s New
        • Component Release Notes for Neuron Developer Tools
    • Helper Tools
      • Check Model
      • GatherInfo
    • Transformers NeuronX (transformers-neuronx)
      • Setup
      • Developer Guide
        • Transformers NeuronX (transformers-neuronx) Developer Guide
        • Transformers NeuronX (transformers-neuronx) Developer Guide for Continuous Batching
      • Tutorials
        • Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1
        • Hugging Face facebook/opt-66b autoregressive sampling on Inf2
      • Misc
  • Repository
  • Open issue

Posts tagged announce-eos-tensorflow

Announcing end of support for TensorFlow for Inferentia2 (Inf2) starting with Neuron 2.29

  • 26 February 2026
  • en
  • announce-eos-tensorflow

This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3

Neuron Release 2.28 is the last release to support TensorFlow for Inferentia2 (Inf2). Future Neuron releases will not include support for TensorFlow for Inf2 instance users.

Read more ...


By AWS

© Copyright 2026, Amazon.com.