This document is relevant for: Inf1
AWS Inf1 Architecture#
On this page, we provide an architectural overview of the AWS Inf1 instance and the corresponding Inferentia NeuronDevices that power them (Inferentia devices from here on).
Inf1 Architecture#
The EC2 Inf1 instance is powered by 16 Inferentia devices, allowing customers to choose between four instance sizes:
Instance size |
# of Inferentia devices |
vCPUs |
Host Memory (GiB) |
FP16/BF16 TFLOPS |
INT8 TOPS |
Device Memory (GiB) |
Device Memory bandwidth (GiB/sec) |
NeuronLink-v1 device-to-device bandwidth (GiB/sec/device) |
EFA bandwidth (Gbps) |
---|---|---|---|---|---|---|---|---|---|
Inf1.xlarge |
1 |
4 |
8 |
64 |
128 |
8 |
50 |
N/A |
up-to 25 |
Inf1.2xlarge |
1 |
8 |
16 |
64 |
128 |
8 |
50 |
N/A |
up-to 25 |
Inf1.6xlarge |
4 |
24 |
48 |
256 |
512 |
32 |
200 |
32 |
25 |
Inf1.24xlarge |
16 |
96 |
192 |
1024 |
2048 |
128 |
800 |
32 |
100 |
Inf1 offers a direct device-to-device interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the Neuron Core Pipeline technology.
This document is relevant for: Inf1