Inferentia Architecture
This document is relevant for: Inf1
Inferentia Architecture#
At the heart of the Inf1 instance are 16 x Inferentia devices (each Inferentia include 4 x NeuronCore-v1), as depicted below:

Each Inferentia device consists of:
- Compute:
4x NeuronCore-v1 cores, delivering 128 INT8 TOPS and 64 FP16/BF16 TFLOPS
- Device Memory:
8GB of device DRAM memory (for storing parameters and intermediate state), with 50 GB/sec of bandwidth
- NeuronLink:
Enables co-optimization of latency and throughput via the Neuron Core Pipeline technology
This document is relevant for: Inf1