AWS Neuron Reference for NeMo Megatron#

AWS Neuron Reference for NeMo Megatron is a library that includes modified versions of the open-source packages NeMo and Apex that have been adapted for use with AWS Neuron and AWS EC2 Trn1 instances. The library supports Tensor Parallel, Pipeline parallel and Data Parallel configurations for distributed training of large language models like GPT-3 175B. The APIs have been optimized for XLA based computation and high performance communication over Trainium instances. The library uses various techniques to improve memory utilization such as sequence parallelism which reduces activation memory footprint, selective or full activation checkpointing which allows larger model configurations to fit. SPMD optimizations are also used whenever possible to reduce the number of graphs obtained.

Setup (neuronx-nemo-megatron)

The library can be installed from neuronx-nemo-megatron github repo

Tutorials (neuronx-nemo-megatron)

Important Tips for Training with Neuron NeMo Megatron#

Do Not Create the Attention Mask#

If you are using your own data pipeline, do not create an attention mask for each record. Neuron NeMo Megatron is optimized to create an attention mask on Neuron Cores directly before use. Creating an attention mask per sample consumes excess CPU memory and often causes out of memory errors on CPU.