This document is relevant for: Inf1, Inf2, Trn1, Trn2

Neuron Documentation Release Notes#

Neuron 2.21.0#

Date: 12/20/2024

Neuron Architectue and Features - Added Trainium2 Architectue guide. See Trainium2 Architecture - Added Trn2 Architecture guide. See Amazon EC2 Trn2 Architecture - Added Logical NeuronCore configuration guide. See Logical NeuronCore configuration - Added NeuronCore-v3 Architecture guide. See NeuronCore-v3 Architecture

Neuron Compiler - Added NKI tutorial for SPMD usage with multiple Neuron Cores on Trn2. See tutorial - Updated NKI FAQ with Trn2 FAQs. See NKI FAQ - Added Direct Allocation Developer Guide - Updated nki.isa API guide with support for new APIs. - Updated nki.language API guide with support for new APIs. - Updated nki.compiler API guide with support for new APIs. - Updated NKI datatype guide with support for float8_e5m2. - Updated kernels with support for allocated_fused_self_attn_for_SD_small_head_size and allocated_fused_rms_norm_qkv kernels

Neuron Runtime - Updated troubleshooting doc with information on device out-of-memory errors after upgrading to Neuron Driver 2.19 or later. See small_allocations_mempool

NeuronX Distributed Inference - Added Application Note to introduce NxD Inference. See Introducing NeuronX Distributed (NxD) Inference - Added NxD Inference Supported Features Guide. See NxD Inference Features Configuration Guide - Added NxD Inference Tutorial for Deploying Llama 3.1 405B (Trn2). See Tutorial: Deploying Llama3.1 405B (Trn2) - Added NxD Inference API Reference Guide. See nxd-inference-api-guides - Added NxD Inference Production Ready Models (Model Hub) Guide. See NxD Inference - Production Ready Models - Added Migration Guide from NxD examples to NxD Inference. See Migrating from NxD Core inference examples to NxD Inference - Added Migration Guide from Transformers NeuronX to NeuronX Distributed Inference. See Migrating from Transformers NeuronX to NeuronX Distributed(NxD) Inference - Added vLLM User Guide for NxD Inference. See vLLM User Guide for NxD Inference - Added tutorial for deploying Llama3.2 Multimodal Models. See Tutorial: Deploying Llama3.2 Multimodal Models

NeuronX Distributed Training - Updated Training APIs, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, YAML Configuration Settings, and Checkpoint Conversion with support for fused Q,K,V - Updated YAML Configuration Settings with support for Trn2 configuration API - UpdatedDirect Checkpoint Conversion with support for HuggingFace Model Conversion - Added tutorial for HuggingFace Llama3.1/Llama3-70B Pretraining. See HuggingFace Llama3.1/Llama3-70B Pretraining - Added tutorial for HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning. See HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning

Transformers NeuronX - Updated Transformers NeuronX (transformers-neuronx) Developer Guide and PyTorch NeuronX Tracing API for Inference with support for CPU compilation. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide to enable skipping the first Allgather introduced by flash decoding at the cost of duplicate Q weights. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide with support for EAGLE speculation

Neuron Tools - Added Neuron Profiler 2.0 Beta User Guide with support for system profiles, integration with Perfetto, distributed workload support, etc. See Neuron Profiler 2.0 (Beta) User Guide - Updated nccom-test user guide to include support for Trn2. See NCCOM-TEST User Guide - Updated neuron-ls user guide to include support for Trn2. See Neuron LS User Guide - Updated neuron-monitor user guide to include support for Trn2. See Neuron Monitor User Guide - Updated neuron-top user guide to include support for Trn2. See Neuron Top User Guide - Added Ask Q Developer documentation for general Neuron guidance and jumpstarting NKI kernel developement. See Ask Q Developer

PyTorch NeuronX - Added troubleshooting note for eager debug mode errors. See PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide - Add torch-neuronx cxx11 ABI documentation. See Install with support for C++11 ABI - Added Migration Guide From XLA_USE_BF16/ XLA_DOWNCAST_BF16. See Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16 - Updated BERT tutorial to not use XLA_DOWNCAST_BF16 and updated BERT-Large pretraining phase to BFloat16 BERT-Large pretraining with AdamW and stochastic rounding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel) - Added Appliation Note for PyTorch 2.5 support. See Introducing PyTorch 2.5 Support - Updated PyTorch NeuronX Environment Variables document with support for PyTorch 2.5. See PyTorch NeuronX Environment Variables

Misc - Added a third-party developer flow solutions page. See Third-party solutions - Added a third-party libraries page. See Third-party libraries

End of support announcements - Announcing end of support for Neuron DET tool starting next release - Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release - Announcing end of support for Python 3.8 in future releases - Announcing end of support for PyTorch 1.13 starting next release - Announcing end of support for PyTorch 2.1 starting next release - Announcing end of support for Ubuntu20 DLCs and DLAMIs - Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions

Neuron 2.20.0#

Date: 09/16/2024

Neuron Compiler

  • Added Getting Started with NKI guide for implementing a simple “Hello World” style NKI kernel and running it on a Neuron Device (Trainium/Inferentia2). See Getting Started with NKI

  • Added NKI Programming Model guide for explaining the three main stages of the NKI programming model. See NKI Programming Model

  • Added NKI Kernel as a Framework Custom Operator guide for explaining how to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples. See NKI Kernel as a Framework Custom Operator

  • Added NKI Tutorials for the following kernels: Tensor addition, Transpose2D, AveragePool2D, Matrix multiplication, RMSNorm, Fused Self Attention, LayerNorm, and Fused Mamba. See nki.kernels

  • Added NKI Kernels guide for optimized kernel examples. See nki.kernels

  • Added Trainium/Inferentia2 Architecture Guide for NKI. See Trainium/Inferentia2 Architecture Guide for NKI

  • Added Profiling NKI kernels with Neuron Profile. See Profiling NKI kernels with Neuron Profile

  • Added NKI Performance Guide for explaining a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations to address such bottlenecks. See NKI Performance Guide

  • Added NKI API Reference Manual with nki framework and types, nki.language, nki.isa, NKI API Common Fields, and NKI API Errors. See NKI API Reference Manual

  • Added NKI FAQ. See NKI FAQ

  • Added NKI Known Issues. See NKI Known Issues

  • Updated Neuron Glossary with NKI terms. See Neuron Glossary

  • Added new NKI samples repository

  • Added average_pool2d, fused_mamba, layernorm, matrix_multiplication, rms_norm, sd_attention, tensor_addition, and transpose_2d kernel tutorials to the NKI samples respository. See NKI samples repository

  • Added unit and integration tests for each kernel. See NKI samples repository

  • Updated Custom Operators API Reference Guide with updated terminology (HBM). See Custom Operators API Reference Guide [Beta]

NeuronX Distributing Training (NxDT)

NeuronX Distributed Core (NxD Core)

JAX Neuron

  • Added JAX Neuron Main page. See JAX Neuron (beta)

  • Added JAX Neuron plugin instructions. See jax-neuronx-setup

  • Added JAX Neuron setup instructions. See JAX Setup

PyTorch NeuronX

Transformers NeuronX

Neuron Runtime

  • Updated Neuron Runtime Troubleshooting guide with the latest hardware error codes and logs and with Neuron Runtime execution fails at out-of-bound access. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1

  • Updated Neuron Sysfs User Guide with new sysfs entries and device reset instructions. See Neuron Sysfs User Guide

  • Added Neuron Runtime Input Dump on Trn1 documentation. See nrt-input-dumps

Containers

Neuron Tools

Software Maintenance and Misc

Neuron 2.19.0#

Date: 07/03/2024

Neuron 2.18.0#

Date: 04/01/2024

Neuron 2.16.0#

Date: 12/21/2023

Neuron 2.15.0#

Date: 10/26/2023

Known Issues and Limitations#

Following tutorials are currently not working. These tutorials will be updated once there is a fix.

Neuron 2.14.0#

Date: 09/15/2023

  • Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator

  • Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag

  • Updated HF ViT benchmarking script to use --model-type=transformer flag. See [script]

  • Updated torch_neuronx.analyze API documentation. See PyTorch NeuronX Analyze API for Inference

  • Updated Performance benchmarking numbers for models on Inf1,Inf2 and Trn1 instances with 2.14 release bits. See _benchmark

  • New tutorial for Training Llama2 7B with Tensor Parallelism and ZeRO-1 Optimizer using neuronx-distributed Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer

  • New tutorial for T5-3B model inference using neuronx-distributed (tutorial)

  • Updated Neuron Persistent Cache documentation regarding clarification of flags parsed by neuron_cc_wrapper tool which is a wrapper over Neuron Compiler CLI. See Neuron Persistent Cache

  • Added tokenizers_parallelism=true in various notebook scripts to supress tokenizer warnings making errors easier to detect

  • Updated Neuron device plugin and scheduler YAMLs to point to latest images. See yaml configs

  • Added notebook script to fine-tune deepmind/language-perceiver model using torch-neuronx. See sample script

  • Added notebook script to fine-tune clip-large model using torch-neuronx. See sample script

  • Added SD XL Base+Refiner inference sample script using torch-neuronx. See sample script

  • Upgraded default diffusers library from 0.14.0 to latest 0.20.2 in Stable Diffusion 1.5 and Stable Diffusion 2.1 inference scripts. See sample scripts

  • Added Llama-2-13B model training script using neuronx-nemo-megatron ( tutorial )

Neuron 2.13.0#

Date: 08/28/2023

Neuron 2.12.0#

Date: 07/19/2023

Neuron 2.11.0#

Date: 06/14/2023

This document is relevant for: Inf1, Inf2, Trn1, Trn2