This document is relevant for: Inf1

AWS Inf1 Architecture#

On this page, we provide an architectural overview of the AWS Inf1 instance and the corresponding Inferentia NeuronDevices that power them (Inferentia devices from here on).

Inf1 Architecture#

The EC2 Inf1 instance is powered by 16 Inferentia devices, allowing customers to choose between four instance sizes:

Instance size

# of Inferentia devices

vCPUs

Host Memory (GiB)

FP16/BF16 TFLOPS

INT8 TOPS

Device Memory (GiB)

Device Memory bandwidth (GiB/sec)

NeuronLink-v1 device-to-device bandwidth (GiB/sec/device)

EFA bandwidth (Gbps)

Inf1.xlarge

1

4

8

64

128

8

50

N/A

up-to 25

Inf1.2xlarge

1

8

16

64

128

8

50

N/A

up-to 25

Inf1.6xlarge

4

24

48

256

512

32

200

32

25

Inf1.24xlarge

16

96

192

1024

2048

128

800

32

100

Inf1 offers a direct device-to-device interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the Neuron Core Pipeline technology.

../../../_images/inf1-server-arch.png

This document is relevant for: Inf1