Neuron 2.26.1 is released! Check What's New and Announcements for more details.

Amazon EC2 Inf1 Architecture

Contents

This document is relevant for: Inf1

Amazon EC2 Inf1 Architecture#

On this page, we provide an architectural overview of the Amazon EC2 Inf1 instance and the corresponding Inferentia NeuronChips that power them (Inferentia chips from here on).

Inf1 Architecture #

The EC2 Inf1 instance is powered by 16 Inferentia chips, allowing customers to choose between four instance sizes:

Instance size	# of Inferentia chips	vCPUs	Host Memory (GiB)	FP16/BF16 TFLOPS	INT8 TOPS	Device Memory (GiB)	Device Memory bandwidth (GiB/sec)	NeuronLink-v1 chip-to-chip bandwidth (GiB/sec/chip)	EFA bandwidth (Gbps)
Inf1.xlarge	1	4	8	64	128	8	50	N/A	up-to 25
Inf1.2xlarge	1	8	16	64	128	8	50	N/A	up-to 25
Inf1.6xlarge	4	24	48	256	512	32	200	32	25
Inf1.24xlarge	16	96	192	1024	2048	128	800	32	100

Inf1 offers a direct chip-to-chip interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the Neuron Core Pipeline technology.

../../../_images/inf1-server-arch.png

This document is relevant for: Inf1