This document is relevant for: Inf2, Trn1, Trn2

Trainium2 Architecture Guide for NKI#

This guide covers hardware architecture of third-generation NeuronDevices: Trainium2. We assume readers have gone through Trainium/Inferentia2 Architecture Guide in detail to understand the basics of NeuronDevice Architecture.

Fig. 35 shows a block diagram of a Trainium2 device, which consists of:

  • 8 NeuronCores (v3).

  • 4 HBM stacks with a total device memory capacity of 96GiB and bandwidth of 2.9TB/s.

  • 128 DMA (Direct Memory Access) engines to move data within and across devices.

  • 20 CC-Cores for collective communication.

  • 4 NeuronLink-v3 for device-to-device collective communication.

../../../_images/neuron_device3.png

Fig. 35 Trainium2 Device Diagram.#

For a high-level architecture specification comparison from Trainium1 to Trainium2, check out Neuron architecture guide.

This document is relevant for: Inf2, Trn1, Trn2