This document is relevant for: Inf1, Inf2, Trn1, Trn2

Neuron Glossary#

Terms#

Neuron Devices (Accelerated Machine Learning chips)#

Term

Description

Inferentia#

AWS first generation accelerated machine learning chip supporting inference only

Trainium/Inferentia2#

AWS second generation accelerated machine learning chip supporting training and inference

Trainium2#

AWS second generation accelerated machine learning chip supporting training and inference

Neuron Device#

Accelerated machine learning chip (e.g. Inferentia or Trainium)

Neuron powered Instances#

Term

Description

Inf1#

Inferentia powered accelerated compute EC2 instance

Trn1#

Trainium powered accelerated compute EC2 instance

Inf2#

Inferentia2 powered accelerated compute EC2 instance

Trn2#

Trainium2 powered accelerated compute EC2 instance

NeuronCore terms#

Term

Description

NeuronCore#

The machine learning compute cores within Inferentia/Trainium

NeuronCore-v1#

Neuron Core within Inferentia

NeuronCore-v2#

Neuron Core within Trainium1/Inferentia2

NeuronCore-v3#

Neuron Core within Trainium2

Tensor Engine#

2D systolic array (within the NeuronCore), used for matrix computations

Scalar Engine#

A scalar-engine within each NeuronCore, which can accelerate element-wise operations (e.g. GELU, ReLU, reciprocal, etc)

Vector Engine#

A vector-engine with each NeuronCore, which can accelerate spatial operations (e.g. layerNorm, TopK, pooling, etc)

GPSIMD Engine#

Embedded General Purpose SIMD cores, within each NeuronCore, to accelerate custom-operators

Sync Engine#

The SP engine, which is integrated inside NeuronCore. Used for synchronization and DMA triggering.

Collective Communication Engine#

Dedicated engine for collective communication, allows for overlapping computation and communication

High Bandwidth Memory#

High Bandwidth Memory, used as device memory for NeuronCore-v2 and beyond.

State Buffer#

The main software-managed on-chip memory in NeuronCore-v1 and beyond.

Partial Sum Buffer#

A second software-managed on-chip memory in NeuronCore-v1 and beyond, with near-memory accumulation support for TensorE output data.

Interconnect between NeuronCores

Interconnect between NeuronCores in Inferentia device

Interconnect between NeuronCores in Trainium1/Inferentia2 device

Interconnect between NeuronCores in Trainium2 device

Neuron SDK terms#

Term

Description

Neuron Kernel Interface#

A bare-metal language and compiler for directly programming Neuron devices available on AWS Trainium/Inferentia2 and beyond devices.

Abbreviations#

Abbreviation

Description

NxD Core#

NeuronX Distributed Core Library

NxD Training#

NeuronX Distributed Training Library

NxD Inference#

NeuronX Distributed Inference Library

NC#

Neuron Core

NeuronCore#

Neuron Core

ND#

Neuron Device

NeuronDevice#

Neuron Device

TensorE#

Tensor Engine

ScalarE#

Scalar Engine

VectorE#

Vector Engine

GpSimdE#

GpSimd Engine

CCE#

Collective Communication Engine

HBM#

High Bandwidth Memory

SBUF#

State Buffer

PSUM#

Partial Sum Buffer

FP32#

Float32

TF32#

TensorFloat32

FP16#

Float16

BF16#

Bfloat16

cFP8#

Configurable Float8

RNE#

Round Nearest Even

SR#

Stochastic Rounding

NKI#

Neuron Kernel Interface

CustomOps#

Custom Operators

RT#

Neuron Runtime

DP#

Data Parallel

DPr#

Data Parallel degree

TP#

Tensor Parallel

TPr#

Tensor Parallel degree

PP#

Pipeline Parallel

PPr#

Pipeline Parallel degree

This document is relevant for: Inf1, Inf2, Trn1, Trn2