This document is relevant for: Inf1

Inferentia Architecture#

At the heart of each Inf1 instance are sixteen Inferentia devices, each with four NeuronCore-v1, as depicted below:


Each Inferentia device consists of:


Four NeuronCore-v1 cores, delivering 128 INT8 TOPS and 64 FP16/BF16 TFLOPS

Device Memory

8GiB of device DRAM memory (for storing parameters and intermediate state), with 50 GiB/sec of bandwidth


Enables co-optimization of latency and throughput via the Neuron Core Pipeline technology

This document is relevant for: Inf1