This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Neuron Documentation Release Notes#

Neuron 2.20.0#

Date: 09/16/2024

Neuron Compiler

  • Added Getting Started with NKI guide for implementing a simple “Hello World” style NKI kernel and running it on a Neuron Device (Trainium/Inferentia2). See Getting Started with NKI

  • Added NKI Programming Model guide for explaining the three main stages of the NKI programming model. See NKI Programming Model

  • Added NKI Kernel as a Framework Custom Operator guide for explaining how to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples. See NKI Kernel as a Framework Custom Operator

  • Added NKI Tutorials for the following kernels: Tensor addition, Transpose2D, AveragePool2D, Matrix multiplication, RMSNorm, Fused Self Attention, LayerNorm, and Fused Mamba. See nki.kernels

  • Added NKI Kernels guide for optimized kernel examples. See nki.kernels

  • Added Trainium/Inferentia2 Architecture Guide for NKI. See Trainium/Inferentia2 Architecture Guide for NKI

  • Added Profiling NKI kernels with Neuron Profile. See Profiling NKI kernels with Neuron Profile

  • Added NKI Performance Guide for explaining a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations to address such bottlenecks. See NKI Performance Guide

  • Added NKI API Reference Manual with nki framework and types, nki.language, nki.isa, NKI API Common Fields, and NKI API Errors. See NKI API Reference Manual

  • Added NKI FAQ. See NKI FAQ

  • Added NKI Known Issues. See NKI Known Issues

  • Updated Neuron Glossary with NKI terms. See Neuron Glossary

  • Added new NKI samples repository

  • Added average_pool2d, fused_mamba, layernorm, matrix_multiplication, rms_norm, sd_attention, tensor_addition, and transpose_2d kernel tutorials to the NKI samples respository. See NKI samples repository

  • Added unit and integration tests for each kernel. See NKI samples repository

  • Updated Custom Operators API Reference Guide with updated terminology (HBM). See Custom Operators API Reference Guide [Beta]

NeuronX Distributing Training (NxDT)

NeuronX Distributed Core (NxD Core)

JAX Neuron

PyTorch NeuronX

Transformers NeuronX

Neuron Runtime

  • Updated Neuron Runtime Troubleshooting guide with the latest hardware error codes and logs and with Neuron Runtime execution fails at out-of-bound access. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1

  • Updated Neuron Sysfs User Guide with new sysfs entries and device reset instructions. See Neuron Sysfs User Guide

  • Added Neuron Runtime Input Dump on Trn1 documentation. See nrt-input-dumps

Containers

Neuron Tools

Software Maintenance and Misc

Neuron 2.19.0#

Date: 07/03/2024

Neuron 2.18.0#

Date: 04/01/2024

Neuron 2.16.0#

Date: 12/21/2023

Neuron 2.15.0#

Date: 10/26/2023

Known Issues and Limitations#

Following tutorials are currently not working. These tutorials will be updated once there is a fix.

Neuron 2.14.0#

Date: 09/15/2023

  • Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator

  • Announcement to deprecate --model-type=transformer-inference flag. See Announcing deprecation for --model-type=transformer-inference compiler flag

  • Updated HF ViT benchmarking script to use --model-type=transformer flag. See [script]

  • Updated torch_neuronx.analyze API documentation. See PyTorch NeuronX Analyze API for Inference

  • Updated Performance benchmarking numbers for models on Inf1,Inf2 and Trn1 instances with 2.14 release bits. See _benchmark

  • New tutorial for Training Llama2 7B with Tensor Parallelism and ZeRO-1 Optimizer using neuronx-distributed Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer

  • New tutorial for T5-3B model inference using neuronx-distributed (tutorial)

  • Updated Neuron Persistent Cache documentation regarding clarification of flags parsed by neuron_cc_wrapper tool which is a wrapper over Neuron Compiler CLI. See Neuron Persistent Cache

  • Added tokenizers_parallelism=true in various notebook scripts to supress tokenizer warnings making errors easier to detect

  • Updated Neuron device plugin and scheduler YAMLs to point to latest images. See yaml configs

  • Added notebook script to fine-tune deepmind/language-perceiver model using torch-neuronx. See sample script

  • Added notebook script to fine-tune clip-large model using torch-neuronx. See sample script

  • Added SD XL Base+Refiner inference sample script using torch-neuronx. See sample script

  • Upgraded default diffusers library from 0.14.0 to latest 0.20.2 in Stable Diffusion 1.5 and Stable Diffusion 2.1 inference scripts. See sample scripts

  • Added Llama-2-13B model training script using neuronx-nemo-megatron ( tutorial )

Neuron 2.13.0#

Date: 08/28/2023

Neuron 2.12.0#

Date: 07/19/2023

Neuron 2.11.0#

Date: 06/14/2023

This document is relevant for: Inf1, Inf2, Trn1, Trn1n