This document is relevant for: Inf1

Inferentia Architecture#

At the heart of each Inf1 instance are sixteen Inferentia chips, each with four NeuronCore-v1, as depicted below:

../../../_images/inferentia-neurondevice.png

Each Inferentia chip consists of:

Compute

Four NeuronCore-v1 cores, delivering 128 INT8 TOPS and 64 FP16/BF16 TFLOPS

Device Memory

8GiB of device DRAM memory (for storing parameters and intermediate state), with 50 GiB/sec of bandwidth

NeuronLink

Enables co-optimization of latency and throughput via the Neuron Core Pipeline technology

This document is relevant for: Inf1