This document is relevant for: Inf1
Amazon EC2 Inf1 Architecture#
On this page, we provide an architectural overview of the Amazon EC2 Inf1 instance and the corresponding Inferentia NeuronChips that power them (Inferentia chips from here on).
Inf1 Architecture#
The EC2 Inf1 instance is powered by 16 Inferentia chips, allowing customers to choose between four instance sizes:
| Instance size | # of Inferentia chips | vCPUs | Host Memory (GiB) | FP16/BF16 TFLOPS | INT8 TOPS | Device Memory (GiB) | Device Memory bandwidth (GiB/sec) | NeuronLink-v1 chip-to-chip bandwidth (GiB/sec/chip) | EFA bandwidth (Gbps) | 
|---|---|---|---|---|---|---|---|---|---|
| Inf1.xlarge | 1 | 4 | 8 | 64 | 128 | 8 | 50 | N/A | up-to 25 | 
| Inf1.2xlarge | 1 | 8 | 16 | 64 | 128 | 8 | 50 | N/A | up-to 25 | 
| Inf1.6xlarge | 4 | 24 | 48 | 256 | 512 | 32 | 200 | 32 | 25 | 
| Inf1.24xlarge | 16 | 96 | 192 | 1024 | 2048 | 128 | 800 | 32 | 100 | 
Inf1 offers a direct chip-to-chip interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the Neuron Core Pipeline technology.
 
This document is relevant for: Inf1
