This document is relevant for: Inf2
, Trn1
, Trn2
Trainium2 Architecture Guide for NKI#
This guide covers hardware architecture of third-generation NeuronDevices: Trainium2. We assume readers have gone through Trainium/Inferentia2 Architecture Guide in detail to understand the basics of NeuronDevice Architecture.
Fig. 35 shows a block diagram of a Trainium2 device, which consists of:
8 NeuronCores (v3).
4 HBM stacks with a total device memory capacity of 96GiB and bandwidth of 2.9TB/s.
128 DMA (Direct Memory Access) engines to move data within and across devices.
20 CC-Cores for collective communication.
4 NeuronLink-v3 for device-to-device collective communication.
For a high-level architecture specification comparison from Trainium1 to Trainium2, check out Neuron architecture guide.
This document is relevant for: Inf2
, Trn1
, Trn2