This document is relevant for: Inf1

AWS Inf1 Architecture#

In this page, we provide an architectural overview of the AWS Inf1 instances, and the corresponding Inferentia NeuronDevices that power them (Inferentia devices from here on).

Inf1 Architecture#

The EC2 Inf1 instance is powered by 16 Inferentia devices, and allows customers to choose between four instances sizes:

Instance size

# of Inferentia devices

vCPUs

Host Memory (GiB)

FP16/BF16 TFLOPS

INT8 TOPS

Device Memory (GiB)

Device Memory Bandwidth (GiB/sec)

NeuronLink-v1 device-to-device bandwidth (GiB/sec/device)

EFA bandwidth (Gbps)

Inf1.xlarge

1

4

8

64

128

8

50

N/A

up-to 25

Inf1.2xlarge

1

8

16

64

128

8

50

N/A

up-to 25

Inf1.6xlarge

4

24

48

256

512

32

200

32

25

Inf1.24xlarge

16

96

192

1024

2048

128

800

32

100

Inf1 offers a direct device-to-device interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the Neuron Core Pipeline <neuroncore-pipeline> technology.

../../../_images/inf1-server-arch.png

Inferentia Architecture#

At the heart of the Inf1 instance are 16 Inferentia devices, as depicted below:

../../../_images/inferentia-neurondevice.png

Each Inferentia device consists of:

  • Compute:
    • 4x NeuronCore-v1 cores, delivering 128 INT8 TOPS and 64 FP16/BF16 TFLOPS

  • Device Memory:
    • 8GB of device DRAM memory (for storing parameters and intermediate state), with 50 GB/sec of bandwidth

  • NeuronLink:

This document is relevant for: Inf1