This document is relevant for: Inf1
Inferentia Architecture#
At the heart of each Inf1 instance are sixteen Inferentia chips, each with four NeuronCore-v1, as depicted below:
Each Inferentia chip consists of:
Compute |
Four NeuronCore-v1 cores, delivering 128 INT8 TOPS and 64 FP16/BF16 TFLOPS |
Device Memory |
8GiB of device DRAM memory (for storing parameters and intermediate state), with 50 GiB/sec of bandwidth |
NeuronLink |
Enables co-optimization of latency and throughput via the Neuron Core Pipeline technology |
This document is relevant for: Inf1