AWS Inf1 Architecture
Contents
This document is relevant for: Inf1
AWS Inf1 Architecture#
In this page, we provide an architectural overview of the AWS Inf1 instances, and the corresponding Inferentia NeuronDevices that power them (Inferentia devices from here on).
Table of contents
Inf1 Architecture#
The EC2 Inf1 instance is powered by 16 Inferentia devices, and allows customers to choose between four instances sizes:
Instance size |
# of Inferentia devices |
vCPUs |
Host Memory (GiB) |
FP16/BF16 TFLOPS |
INT8 TOPS |
Device Memory (GiB) |
Device Memory Bandwidth (GiB/sec) |
NeuronLink-v1 device-to-device bandwidth (GiB/sec/device) |
EFA bandwidth (Gbps) |
---|---|---|---|---|---|---|---|---|---|
Inf1.xlarge |
1 |
4 |
8 |
64 |
128 |
8 |
50 |
N/A |
up-to 25 |
Inf1.2xlarge |
1 |
8 |
16 |
64 |
128 |
8 |
50 |
N/A |
up-to 25 |
Inf1.6xlarge |
4 |
24 |
48 |
256 |
512 |
32 |
200 |
32 |
25 |
Inf1.24xlarge |
16 |
96 |
192 |
1024 |
2048 |
128 |
800 |
32 |
100 |
Inf1 offers a direct device-to-device interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the Neuron Core Pipeline <neuroncore-pipeline> technology.

Inferentia Architecture#
At the heart of the Inf1 instance are 16 Inferentia devices, as depicted below:

Each Inferentia device consists of:
- Compute:
4x NeuronCore-v1 cores, delivering 128 INT8 TOPS and 64 FP16/BF16 TFLOPS
- Device Memory:
8GB of device DRAM memory (for storing parameters and intermediate state), with 50 GB/sec of bandwidth
- NeuronLink:
Enables co-optimization of latency and throughput via the Neuron Core Pipeline technology
This document is relevant for: Inf1