This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
Neuron Documentation Release Notes#
Neuron 2.21.0#
Date: 12/20/2024
Neuron Architectue and Features - Added Trainium2 Architectue guide. See Trainium2 Architecture - Added Trn2 Architecture guide. See Amazon EC2 Trn2 Architecture - Added Logical NeuronCore configuration guide. See Logical NeuronCore configuration - Added NeuronCore-v3 Architecture guide. See NeuronCore-v3 Architecture
Neuron Compiler
- Added NKI tutorial for SPMD usage with multiple Neuron Cores on Trn2. See tutorial
- Updated NKI FAQ with Trn2 FAQs. See NKI FAQ
- Added Direct Allocation Developer Guide
- Updated nki.isa API guide with support for new APIs.
- Updated nki.language API guide with support for new APIs.
- Updated nki.compiler API guide with support for new APIs.
- Updated NKI datatype guide with support for float8_e5m2
.
- Updated kernels with support for allocated_fused_self_attn_for_SD_small_head_size and allocated_fused_rms_norm_qkv kernels
Neuron Runtime - Updated troubleshooting doc with information on device out-of-memory errors after upgrading to Neuron Driver 2.19 or later. See small_allocations_mempool
NeuronX Distributed Inference - Added Application Note to introduce NxD Inference. See Introducing NeuronX Distributed (NxD) Inference - Added NxD Inference Supported Features Guide. See NxD Inference Features Configuration Guide - Added NxD Inference Tutorial for Deploying Llama 3.1 405B (Trn2). See Tutorial: Deploying Llama3.1 405B (Trn2) - Added NxD Inference API Reference Guide. See nxd-inference-api-guides - Added NxD Inference Production Ready Models (Model Hub) Guide. See NxD Inference - Production Ready Models - Added Migration Guide from NxD examples to NxD Inference. See Migrating from NxD Core inference examples to NxD Inference - Added Migration Guide from Transformers NeuronX to NeuronX Distributed Inference. See Migrating from Transformers NeuronX to NeuronX Distributed(NxD) Inference - Added vLLM User Guide for NxD Inference. See vLLM User Guide for NxD Inference - Added tutorial for deploying Llama3.2 Multimodal Models. See Tutorial: Deploying Llama3.2 Multimodal Models
NeuronX Distributed Training - Updated Training APIs, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, YAML Configuration Settings, and Checkpoint Conversion with support for fused Q,K,V - Updated YAML Configuration Settings with support for Trn2 configuration API - UpdatedDirect Checkpoint Conversion with support for HuggingFace Model Conversion - Added tutorial for HuggingFace Llama3.1/Llama3-70B Pretraining. See HuggingFace Llama3.1/Llama3-70B Pretraining - Added tutorial for HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning. See HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning
Transformers NeuronX - Updated Transformers NeuronX (transformers-neuronx) Developer Guide and PyTorch NeuronX Tracing API for Inference with support for CPU compilation. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide to enable skipping the first Allgather introduced by flash decoding at the cost of duplicate Q weights. - Updated Transformers NeuronX (transformers-neuronx) Developer Guide with support for EAGLE speculation
Neuron Tools - Added Neuron Profiler 2.0 Beta User Guide with support for system profiles, integration with Perfetto, distributed workload support, etc. See Neuron Profiler 2.0 (Beta) User Guide - Updated nccom-test user guide to include support for Trn2. See NCCOM-TEST User Guide - Updated neuron-ls user guide to include support for Trn2. See Neuron LS User Guide - Updated neuron-monitor user guide to include support for Trn2. See Neuron Monitor User Guide - Updated neuron-top user guide to include support for Trn2. See Neuron Top User Guide - Added Ask Q Developer documentation for general Neuron guidance and jumpstarting NKI kernel developement. See Ask Q Developer
PyTorch NeuronX
- Added troubleshooting note for eager debug mode errors. See PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
- Add torch-neuronx cxx11 ABI documentation. See Install with support for C++11 ABI
- Added Migration Guide From XLA_USE_BF16
/ XLA_DOWNCAST_BF16
. See Migration From XLA_USE_BF16/XLA_DOWNCAST_BF16
- Updated BERT tutorial to not use XLA_DOWNCAST_BF16
and updated BERT-Large pretraining phase to BFloat16 BERT-Large pretraining with AdamW and stochastic rounding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel)
- Added Appliation Note for PyTorch 2.5 support. See Introducing PyTorch 2.5 Support
- Updated PyTorch NeuronX Environment Variables document with support for PyTorch 2.5. See PyTorch NeuronX Environment Variables
Misc - Added a third-party developer flow solutions page. See Third-party solutions - Added a third-party libraries page. See Third-party libraries
End of support announcements - Announcing end of support for Neuron DET tool starting next release - Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release - Announcing end of support for Python 3.8 in future releases - Announcing end of support for PyTorch 1.13 starting next release - Announcing end of support for PyTorch 2.1 starting next release - Announcing end of support for Ubuntu20 DLCs and DLAMIs - Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions
Neuron 2.20.0#
Date: 09/16/2024
Neuron Compiler
Added Getting Started with NKI guide for implementing a simple “Hello World” style NKI kernel and running it on a Neuron Device (Trainium/Inferentia2). See Getting Started with NKI
Added NKI Programming Model guide for explaining the three main stages of the NKI programming model. See NKI Programming Model
Added NKI Kernel as a Framework Custom Operator guide for explaining how to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples. See NKI Kernel as a Framework Custom Operator
Added NKI Tutorials for the following kernels: Tensor addition, Transpose2D, AveragePool2D, Matrix multiplication, RMSNorm, Fused Self Attention, LayerNorm, and Fused Mamba. See nki.kernels
Added NKI Kernels guide for optimized kernel examples. See nki.kernels
Added Trainium/Inferentia2 Architecture Guide for NKI. See Trainium/Inferentia2 Architecture Guide for NKI
Added Profiling NKI kernels with Neuron Profile. See Profiling NKI kernels with Neuron Profile
Added NKI Performance Guide for explaining a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations to address such bottlenecks. See NKI Performance Guide
Added NKI API Reference Manual with nki framework and types, nki.language, nki.isa, NKI API Common Fields, and NKI API Errors. See NKI API Reference Manual
Added NKI FAQ. See NKI FAQ
Added NKI Known Issues. See NKI Known Issues
Updated Neuron Glossary with NKI terms. See Neuron Glossary
Added new NKI samples repository
Added average_pool2d, fused_mamba, layernorm, matrix_multiplication, rms_norm, sd_attention, tensor_addition, and transpose_2d kernel tutorials to the NKI samples respository. See NKI samples repository
Added unit and integration tests for each kernel. See NKI samples repository
Updated Custom Operators API Reference Guide with updated terminology (HBM). See Custom Operators API Reference Guide [Beta]
NeuronX Distributing Training (NxDT)
Added NxDT (Beta) Developer Guide. See Developer Guide
Added NxDT Developer Guide for Migrating from NeMo to Neuronx Distributed Training. See NxD Training Compatibility with NeMo
Added NxDT Developer Guide for Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training. See Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training
Added NxDT Developer Guide for Integrating a new dataset/dataloader. See Integrating a new dataset/dataloader
Added NxDT Developer Guide for Integrating a new model. See Integrating a New Model
Added NxDT Developer Guide for Registering an optimizer and LR scheduler. See Registering an optimizer and LR scheduler
Added NxDT YAML Configuration Overview. See YAML Configuration Settings
Added Neuronx Distributed Training Library Features documentation. See Neuronx Distributed Training Library Features
Added Installation instructions for NxDT. See Setup
Added Known Issues and Workarounds for NxDT. See Known Issues and Workarounds
NeuronX Distributed Core (NxD Core)
Updated Developer guide for save/load checkpoint (neuronx-distributed ) with ZeRO-1 Optimizer State Offline Conversion. See Developer guide for save/load checkpoint
Added Developer guide for Standard Mixed Precision with NeuronX Distributed. See Developer guide for Standard Mixed Precision
Updated NeuronX Distributed API Guide LoRA finetuning support. See Distributed Strategies APIs
Added Developer guide for LoRA finetuning with NeuronX Distributed. See Developer guide for LoRA finetuning
Updated CodeLlama tutorial with latest package versions. See tutorial
Added tutorial for Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning with NeuronX Distributed. See Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning
Updated links in Llama2 NxD Finetuning tutorial. See Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning
Updated tokenizer download command in tutorials. See Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer, Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism, and Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer
JAX Neuron
Added JAX Neuron Main page. See JAX Neuron (beta)
Added JAX Neuron plugin instructions. See jax-neuronx-setup
Added JAX Neuron setup instructions. See JAX Setup
PyTorch NeuronX
Updated Developer Guide for Training with PyTorch NeuronX with support for convolution in AMP. See Developer Guide for Training with PyTorch NeuronX.
Added inference samples for Wav2Vec2 conformer models with Relative Position Embeddings and Rotary Position Embedding. See sample and sample.
Updated the ViT sample with updated accelerate version. See sample
Updated PyTorch NeuronX Environment Variables with
NEURON_TRANSFER_WITH_STATIC_RING_OPS
. See PyTorch NeuronX Environment VariablesAdded inference samples for Pixart Alpha and PixArt Sigma models. See sample and sample
Added benchmarking scripts for PixArt alpha. See benchmarking script
Transformers NeuronX
Updated Transformers NeuronX Developer Guide with Multi-node inference support (TP/PP). See Transformers NeuronX (transformers-neuronx) Developer Guide
Updated Transformers NeuronX Developer Guide with BDH layout support. See Transformers NeuronX (transformers-neuronx) Developer Guide
Updated Transformers NeuronX Developer Guide with Flash Decoding to support long sequence lengths up to 128k. See Transformers NeuronX (transformers-neuronx) Developer Guide
Updated Transformers NeuronX Developer Guide with presharded weights support. See Transformers NeuronX (transformers-neuronx) Developer Guide
Added Llama 3.1 405b sample with 16k sequence length. See tutorial
Added Llama 3.1 70b 64k tutorial. See tutorial
Added Llama 3.1 8b 128k tutorial. See tutorial
Removed the sample llama-3-8b-32k-sampling.ipynb and replaced it with Llama-3.1-8B model sample llama-3.1-8b-32k-sampling.ipynb. See sample
Neuron Runtime
Updated Neuron Runtime Troubleshooting guide with the latest hardware error codes and logs and with Neuron Runtime execution fails at out-of-bound access. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1
Updated Neuron Sysfs User Guide with new sysfs entries and device reset instructions. See Neuron Sysfs User Guide
Added Neuron Runtime Input Dump on Trn1 documentation. See nrt-input-dumps
Containers
Added Neuron Helm Chart repository to help streamline the deployment of AWS Neuron components on Amazon EKS. See repo
Updated Kubernetes container deployment process with Neuron Helm Chart documentation. See k8s-neuron-helm-chart
Added guide for Deploying Neuron Container on Elastic Container Service (ECS). See Deploy Neuron Container on Elastic Container Service (ECS) for Training
Added documentation for Neuron Plugins for Containerized Environments. See Neuron Plugins for Containerized Environments
Updated guide for locating DLC images. See Neuron Deep Learning Containers
Neuron Tools
Updated Neuron Profiler User Guide with Alternative output formats. See Neuron Profile User Guide
Software Maintenance and Misc
Updated the Neuron Software Maintenance Policy. See Neuron Software Maintenance policy
Added announcement and updated documentation for end of support start for Tensorflow-Neuron 1.x. See Tensorflow-Neuron 1.x no longer supported
Added announcement and updated documentation for end of support start for ‘neuron-device-version’ field. See ‘neuron-device-version’ field in neuron-monitor no longer supported
Added announcement and updated documentation for end of support start for ‘neurondevice’ resource name. See ‘neurondevice’ resource name in Neuron Device K8s plugin no longer supported
Added announcement and updated documentation for end of support start for AL2. See Neuron Runtime no longer supports Amazon Linux 2 (AL2)
Added announcement for maintenance mode for torch-neuron versions 1.9 and 1.10. See Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions
Added supported Protobuf versions to the Neuron Release Artifacts. See Release Artifacts
Updated Neuron Github Roadmap. See Roadmap
Neuron 2.19.0#
Date: 07/03/2024
Updated Transformers NeuronX Developer guide with support for inference for longer sequence lengths with Flash Attention kernel. See Developer Guide.
Updated Transformers NeuronX developer guide with QKV Weight Fusion support. See Developer Guide.
Updated Transformers NeuronX continuous batching developer guide with updated vLLM instructions and models supported. See Developer Guide.
Updated Neuronx Distributed User guide with interleaved pipeline support. See Distributed Strategies APIs
Added Codellama 13b 16k tutorial with NeuronX Distributed Inference library. See sample
Updated PyTorch NeuronX Environment variables with custom SILU enabled via NEURON_CUSTOM_SILU. See PyTorch NeuronX Environment Variables
Updated ZeRO1 support to have FP32 master weights support and BF16 all-gather. See ZeRO-1 Tutorial.
Updated PyTorch 2.1 Appplication note with workaround for slower loss convergence for NxD LLaMA-3 70B pretraining using ZeRO1 tutorial. See Introducing PyTorch 2.1 Support.
Updated Neuron DLAMI guide with support for new 2.19 DLAMIs. See Neuron DLAMI User Guide.
Updated HF-BERT pre-training documentation for port forwarding. See Hugging Face BERT Pretraining Tutorial (Data-Parallel)
Updated T5 inference tutorial with transformer flag. See sample
Added support for Llama3 model training. See Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism and Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer
Added support for Flash Attention kernel for training longer sequences in NeuronX Distributed. See Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 Optimizer and Distributed Strategies APIs
Updated Llama2 inference tutorial using NxD Inference library. See sample
Added new guide for Neuron node problem detection and recovery tool. See configuration and tutorial.
Added new guide for Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See tutorial
Updated Neuron scheduler extension documentation about enforcing allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See tutorial
Updated Neuron Profiler User Guide with various UI enhancements. See Neuron Profile User Guide
Added NeuronPerf support in Llama2 inference tutorial in NeuronX Distributed. See sample
Added announcement for maintenance mode of MxNet. See Neuron support for MxNet enters maintenance mode
Added announcement for end of support of Neuron TensorFlow 1.x (Inf1). See Announcing end of support for Tensorflow-Neuron 1.x
Added announcement for end of support of AL2. See Announcing end of support for Neuron Runtime support of Amazon Linux 2 (AL2)
Added announcement for end of support of ‘neuron-device-version’ field in neuron-monitor. See Announcing end of support for ‘neuron-device-version’ field in neuron-monitor
Added announcement for end of support of ‘neurondevice’ resource name in Neuron Device K8s plugin. See Announcing end of support for ‘neurondevice’ resource name in Neuron Device K8s plugin
Added announcement for end of support for Probuf versions <= 3.19 for PyTorch NeuronX. See Announcing end of support for Probuf versions <= 3.19 for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries
Neuron 2.18.0#
Date: 04/01/2024
Updated PyTorch NeuronX developer guide with Snapshotting support. See Snapshotting With Torch-Neuronx 2.1.
Updated Distributed Strategies APIs and Developer guide for Pipeline Parallelism with support for
auto_partition
API.Updated Distributed Strategies APIs with enhanced checkpointing support with
load
API andasync_save
API.Updated documentation for
PyTorch Lightning
to train models usingpipeline parallelism
. See API guide and Developer Guide.Updated NeuronX Distributed developer guide with support for Autobucketing
Added PyTorch NeuronX developer guide for Autobucketing.
Updated Distributed Strategies APIs and Training Llama-3.1-70B, Llama-3-70B or Llama-2-13B/70B with Tensor Parallelism and Pipeline Parallelism with support for asynchronous checkpointing.
Updated Transformers NeuronX Developer guide with support for streamer and stopping criteria APIs. See Developer Guide.
Updated Transformers NeuronX Developer guide with instructions for
Repeating N-Gram Filtering
. See Developer Guide.Updated Transformers NeuronX developer guide with Top-K on-device sampling support [Beta]. See Developer Guide.
Updated Transformers NeuronX developer guide with Checkpointing support and automatic model selection. See Developer Guide.
Updated Transformers NeuronX Developer guide with support for speculative sampling [Beta]. See Developer Guide.
Added sample for training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer with
neuronx-distributed
. See Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer.Added Tutorial for codellama/CodeLlama-13b-hf model inference with 16K seq length using Transformers Neuronx. See sample.
Added Mixtral-8x7B Inference Sample/Notebook using TNx. See sample.
Added Mistral-7B-Instruct-v0.2 Inference inference sample using TNx. See sample.
Added announcement for Maintenance mode of TensorFlow 1.x. See Tensorflow-Neuron 1.x enters maintenance mode.
Updated PyTorch 2.1 documentation to reflect stable (out of beta) support. See Introducing PyTorch 2.1 Support.
Updated PyTorch NeuronX environment variables to reflect stable (out of beta) support. See PyTorch NeuronX Environment Variables.
Updated Release Artifacts with supported HuggingFace Transformers versions.
Added user guide instructions for
Neuron DLAMI
. See Neuron DLAMI User Guide.Updated PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API tutorial with latest Hugging Face Trainer API.
Updated Neuron Runtime API guide with support for
nr_tensor_allocate
. See Developer’s Guide - NeuronX Runtime.Updated Neuron Sysfs User Guide with support for
serial_number
unique identifier.Updated Custom Operators API Reference Guide [Beta] limitations and fixed nested sublists. See Neuron Custom C++ Operators Developer Guide [Beta].
Fixed issue in ZeRO-1 Tutorial.
Fixed potential hang during synchronization step in
nccom-test
. See NCCOM-TEST User Guide.Updated troubleshooting guide with an additional hardware error messaging. See Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1.
Updated DLC documentation. See Customize Neuron DLC and Deploy Neuron Container on EC2.
Neuron 2.16.0#
Date: 12/21/2023
Added setup guide instructions for
AL2023
OS. See Setup GuideAdded announcement for name change of Neuron Components. See Announcing Name Change for Neuron Components
Added announcement for End of Support for
PyTorch 1.10
. See Announcing End of Support for PyTorch Neuron version 1.10Added announcement for End of Support for
PyTorch 2.0
Beta. See Announcing End of Support for PyTorch NeuronX version 2.0 (beta)Added announcement for moving NeuronX Distributed sample model implementations. See Announcing deprecation for NeuronX Distributed Training Samples in Neuron Samples Repository
Updated Transformers NeuronX developer guide with support for Grouped Query Attention(GQA). See developer guide
Added sample for
Llama-2-70b
model inference. See tutorialAdded documentation for
PyTorch Lightning
to train models usingtensor parallelism
anddata parallelism
. See api guide , developer guide and tutorialAdded documentation for Model and Optimizer Wrapper training API that handles the parallelization. See api guide and Developer guide for model and optimizer wrapper
Added documentation for New
save_checkpoint
andload_checkpoint
APIs to save/load checkpoints during distributed training. See Developer guide for save/load checkpointAdded documentation for a new
Query-Key-Value(QKV)
module in NeuronX Distributed for Training. See api guide and tutorialAdded new developer guide for Inference using NeuronX Distributed. developer guide
Added
Llama-2-7B
model inference script ([html] [notebook])Added App note on Support for
PyTorch 2.1
(Beta) . See Introducing PyTorch 2.1 SupportAdded developer guide for
replace_weights
API to replace the separated weights. See PyTorch Neuron (torch-neuronx) Weight Replacement API for InferenceAdded [Beta] script for training
stabilityai/stable-diffusion-2-1-base
andrunwayml/stable-diffusion-v1-5
models . See scriptAdded [Beta] script for training
facebook/bart-large
model. See scriptAdded [Beta] script for
stabilityai/stable-diffusion-2-inpainting
model inference. See scriptAdded documentation for new
Neuron Distributed Event Tracing (NDET) tool
to help visualize execution trace logs and diagnose errors in multi-node workloads. See Neuron Distributed Event Tracing (NDET) User GuideUpdated Neuron Profile User guide with support for multi-worker jobs. See Neuron Profile User Guide
Minor updates to Custom Ops API reference guide.See Custom Operators API Reference Guide [Beta]
Neuron 2.15.0#
Date: 10/26/2023
New Introducing PyTorch 2.0 Support (End of Support) application note with
torch-neuronx
New llama2_70b_tp_pp_tutorial and (sample script) using
neuronx-distributed
New Model samples and tutorials documentation for a consolidated list of code samples and tutorials published by AWS Neuron.
New Neuron Software Classification documentation for alpha, beta, and stable Neuron SDK definitions and updated documentation references.
New Pipeline Parallelism Overview and Developer guide for Pipeline Parallelism documentation in
neuronx-distributed
Updated Neuron Distributed API Guide regarding pipeline-parallelism support and checkpointing
New Activation Memory Reduction application note and Developer guide for Activation Memory reduction in
neuronx-distributed
New
Weight Sharing (Deduplication)
notebook scriptAdded Finetuning script for google/electra-small-discriminator with
torch-neuronx
Added ResNet50 training (Beta) tutorial and scripts with
torch-neuronx
Added Vision Perceiver training sample with
torch-neuronx
Added
flan-t5-xl
model inference tutorial usingneuronx-distributed
Added
HuggingFace Stable Diffusion 4X Upscaler model Inference on Trn1 / Inf2
sample script withtorch-neuronx
Updated GPT-NeoX 6.9B and 20B model scripts to include selective checkpointing.
Added serialization support and removed
-O1
flag constraint toLlama-2-13B
model inference script tutorial withtransformers-neuronx
Updated
BERT
script andLlama-2-7B
script with Pytorch 2.0 supportAdded option-argument
llm-training
to the existing--distribution_strategy
compiler option to make specific optimizations related to training distributed models in Neuron Compiler CLI Reference Guide (neuronx-cc)Updated Neuron Sysfs User Guide to include mem_ecc_uncorrected and sram_ecc_uncorrected hardware statistics.
Updated PyTorch NeuronX Tracing API for Inference to include io alias documentation
Updated Transformers NeuronX (transformers-neuronx) Developer Guide with serialization support.
Upgraded
numpy
version to1.22.2
for various scriptsUpdated
LanguagePerceiver
fine-tuning script tostable
Announcing End of Support for OPT example in
transformers-neuronx
Announcing End of Support for “nemo” option-argument
Known Issues and Limitations#
Following tutorials are currently not working. These tutorials will be updated once there is a fix.
Neuron 2.14.0#
Date: 09/15/2023
Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See Neuron Calculator
Announcement to deprecate
--model-type=transformer-inference
flag. See Announcing deprecation for --model-type=transformer-inference compiler flagUpdated HF ViT benchmarking script to use
--model-type=transformer
flag. See [script]Updated
torch_neuronx.analyze
API documentation. See PyTorch NeuronX Analyze API for InferenceUpdated Performance benchmarking numbers for models on Inf1,Inf2 and Trn1 instances with 2.14 release bits. See _benchmark
New tutorial for Training Llama2 7B with Tensor Parallelism and ZeRO-1 Optimizer using
neuronx-distributed
Training Llama3.1-8B, Llama3-8B and Llama2-7B with Tensor Parallelism and ZeRO-1 OptimizerNew tutorial for
T5-3B
model inference usingneuronx-distributed
(tutorial)Updated
Neuron Persistent Cache
documentation regarding clarification of flags parsed byneuron_cc_wrapper
tool which is a wrapper overNeuron Compiler CLI
. See Neuron Persistent CacheAdded
tokenizers_parallelism=true
in various notebook scripts to supress tokenizer warnings making errors easier to detectUpdated Neuron device plugin and scheduler YAMLs to point to latest images. See yaml configs
Added notebook script to fine-tune
deepmind/language-perceiver
model usingtorch-neuronx
. See sample scriptAdded notebook script to fine-tune
clip-large
model usingtorch-neuronx
. See sample scriptAdded
SD XL Base+Refiner
inference sample script usingtorch-neuronx
. See sample scriptUpgraded default
diffusers
library from 0.14.0 to latest 0.20.2 inStable Diffusion 1.5
andStable Diffusion 2.1
inference scripts. See sample scriptsAdded
Llama-2-13B
model training script usingneuronx-nemo-megatron
( tutorial )
Neuron 2.13.0#
Date: 08/28/2023
Added tutorials for GPT-NEOX 6.9B and 20B models training using neuronx-distributed. See more at Tutorials for NeuronX Distributed
Added TensorFlow 2.x (
tensorflow-neuronx
) analyze_model API section. See more at TensorFlow 2.x (tensorflow-neuron) analyze_model APIUpdated setup instructions to fix path of existing virtual environments in DLAMIs. See more at setup guide
Updated setup instructions to fix pinned versions in upgrade instructions of setup guide. See more at setup guide
Updated tensorflow-neuron HF distilbert tutorial to improve performance by removing HF pipeline. See more at [html] [notebook]
Updated training troubleshooting guide in torch-neuronx to describe network Connectivity Issue on trn1/trn1n 32xlarge with Ubuntu. See more at PyTorch Neuron (torch-neuronx) for Training Troubleshooting Guide
Added “Unsupported Hardware Operator Code” section to Neuron Runtime Troubleshooting page. See more at Neuron Runtime Troubleshooting on Inf1, Inf2 and Trn1
Removed ‘beta’ tag from
neuronx-distributed
section for training.neuronx-distributed
Training is now considered stable andneuronx-distributed
inference is considered as beta.Added FLOP count(
flop_count
) and connected Neuron Device ids (connected_devices
) to sysfs userguide. See Neuron Sysfs User GuideAdded tutorial for
T5
model inference. See more at [notebook]Updated neuronx-distributed api guide and inference tutorial. See more at Distributed Strategies APIs and Inference with Tensor Parallelism [Beta]
Announcing End of support for
AWS Neuron reference for Megatron-LM
starting Neuron 2.13. See more at AWS Neuron reference for Megatron-LM no longer supportedAnnouncing end of support for
torch-neuron
version 1.9 starting Neuron 2.14. See more at Announcing end of support for torch-neuron version 1.9Upgraded
numpy
version to1.21.6
in various training scripts for Text ClassificationAdded license for Nemo Megatron to SDK Maintenance Policy. See more at Neuron Software Maintenance policy
Updated
bert-japanese
training Script to usemultilingual-sentiments
dataset. See `hf-bert-jp <aws-neuron/aws-neuron-samples> `_Added sample script for LLaMA V2 13B model inference using transformers-neuronx. See neuron samples repo
Added samples for training GPT-NEOX 20B and 6.9B models using neuronx-distributed. See neuron samples repo
Added sample scripts for CLIP and Stable Diffusion XL inference using torch-neuronx. See neuron samples repo
Added sample scripts for vision and language Perceiver models inference using torch-neuronx. See neuron samples repo
Added camembert training/finetuning example for Trn1 under hf_text_classification in torch-neuronx. See neuron samples repo
Updated Fine-tuning Hugging Face BERT Japanese model sample in torch-neuronx. See neuron samples repo
See more neuron samples changes in neuron samples release notes
Added samples for pre-training GPT-3 23B, 46B and 175B models using neuronx-nemo-megatron library. See aws-neuron-parallelcluster-samples
Announced End of Support for GPT-3 training using aws-neuron-reference-for-megatron-lm library. See aws-neuron-parallelcluster-samples
Updated bert-fine-tuning SageMaker sample by replacing amazon_reviews_multi dataset with amazon_polarity dataset. See aws-neuron-sagemaker-samples
Neuron 2.12.0#
Date: 07/19/2023
Added best practices user guide for benchmarking performance of Neuron Devices Benchmarking Guide and Helper scripts
Announcing end of support for Ubuntu 18. See more at Announcing end of support for Ubuntu 18
Improved sidebar navigation in Documentation.
Removed support for Distributed Data Parallel(DDP) Tutorial.
Neuron 2.11.0#
Date: 06/14/2023
New Neuron Calculator Documentation section to help determine number of Neuron Cores needed for LLM Inference.
Added App Note Generative LLM inference with Neuron
New
ML Libraries
Documentation section to have NxD Core and Transformers NeuronX (transformers-neuronx)Improved Installation and Setup Guides for the different platforms supported. See more at Setup Guide
Added Tutorial How to prepare trn1.32xlarge for multi-node execution
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2