Amazon EC2 Trn2 Architecture#
Trn2 is an Amazon EC2 accelerated computing instance, purpose built for high-performance deep learning training and inference. This page provides an architecture overview of the trn2.48xlarge and trn2u.48xlarge instances, and Trn2 UltraServer.
Trn2 instance sizes#
Trn2 instances and UltraServers are available in the following sizes and configurations:
- trn2.48xlarge 
- trn2u.48xlarge 
- Trn2 UltraServer 
trn2.48xlarge / trn2u.48xlarge#
Trn2 instances are powered by 16 Trainium2 chips connected using a high-bandwidth, low-latency NeuronLink-v3 chip-to-chip interconnect. The NeuronLink-v3 chip-to-chip interconnect enables collective communication between Trainium2 chips during distributed training and inference. It also allows for the pooling of memory resources from all 16 Trainium2 chips.
In a trn2.48xlarge or trn2u.48xlarge instance, 16 Trainium2 chips are connected using a 4x4, 2D Torus topology. The following diagram shows the intra-instance connections of a trn2.48xlarge or trn2u.48xlarge instance
 
Trn2 UltraServer#
A Trn2 UltraServer comprises four trn2u.48xlarge instances connected together via the NeuronLink-v3 chip-to-chip interconnect. This allows for a total of 64 Trainium2 chips to be interconnected within a Trn2 UltraServer. Trainium2 chips with the same coordinates in each Trn2 instance are connected in a ring topology. The following figure shows the inter-instance ring connection between Trainium2 chips.
 
Trn2 instance specifications#
The following table shows the performance metrics for Trainium2 based instances.
| Perfomance specification | trn2.48xlarge / trn2u.48xlarge | Trn2 UltraServer | 
|---|---|---|
| # of Trainium2 chips | 16 | 64 | 
| vCPUs | 192 | 768 | 
| Host Memory (GiB) | 2,048 | 8,192 | 
| FP8 PFLOPS | 20.8 | 83.2 | 
| FP16/BF16/TF32 PFLOPS | 10.7 | 42.8 | 
| FP8/FP16/BF16/TF32 Sparse PFLOPS | 41 | 164 | 
| FP32 PFLOPS | 2.9 | 11.6 | 
| Device Memory (GiB) | 1,536 | 6,144 | 
| Device Memory Bandwidth (TB/sec) | 46.4 | 185.6 | 
| Intra-instance NeuronLink-v3 bandwidth (GB/sec/chip) | 1,024 | 1,024 | 
| Inter-instance NeuronLink-v3 bandwidth (GB/sec/chip) | Not applicable | 256 | 
| EFAv3 bandwidth (Gbps) | 3,200 | 3,200 | 
