This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
Neuron Glossary#
Terms#
Neuron Devices (Accelerated Machine Learning chips)#
Term |
Description |
---|---|
|
AWS first generation accelerated machine learning chip supporting inference only |
|
AWS second generation accelerated machine learning chip supporting training and inference |
|
AWS second generation accelerated machine learning chip supporting training and inference |
|
Accelerated machine learning chip (e.g. Inferentia or Trainium) |
Neuron powered Instances#
Term |
Description |
---|---|
|
Inferentia powered accelerated compute EC2 instance |
|
Trainium powered accelerated compute EC2 instance |
|
Inferentia2 powered accelerated compute EC2 instance |
|
Trainium2 powered accelerated compute EC2 instance |
NeuronCore terms#
Term |
Description |
---|---|
|
The machine learning compute cores within Inferentia/Trainium |
|
Neuron Core within Inferentia |
|
Neuron Core within Trainium1/Inferentia2 |
|
Neuron Core within Trainium2 |
|
2D systolic array (within the NeuronCore), used for matrix computations |
|
A scalar-engine within each NeuronCore, which can accelerate element-wise operations (e.g. GELU, ReLU, reciprocal, etc) |
|
A vector-engine with each NeuronCore, which can accelerate spatial operations (e.g. layerNorm, TopK, pooling, etc) |
|
Embedded General Purpose SIMD cores, within each NeuronCore, to accelerate custom-operators |
|
The SP engine, which is integrated inside NeuronCore. Used for synchronization and DMA triggering. |
|
Dedicated engine for collective communication, allows for overlapping computation and communication |
|
High Bandwidth Memory, used as device memory for NeuronCore-v2 and beyond. |
|
The main software-managed on-chip memory in NeuronCore-v1 and beyond. |
|
A second software-managed on-chip memory in NeuronCore-v1 and beyond, with near-memory accumulation support for TensorE output data. |
|
Interconnect between NeuronCores |
|
Interconnect between NeuronCores in Inferentia device |
|
Interconnect between NeuronCores in Trainium1/Inferentia2 device |
|
Interconnect between NeuronCores in Trainium2 device |
Neuron SDK terms#
Term |
Description |
---|---|
|
A bare-metal language and compiler for directly programming Neuron devices available on AWS Trainium/Inferentia2 and beyond devices. |
Abbreviations#
Abbreviation |
Description |
---|---|
|
NeuronX Distributed Core Library |
|
NeuronX Distributed Training Library |
|
NeuronX Distributed Inference Library |
|
Neuron Core |
|
Neuron Core |
|
Neuron Device |
|
Neuron Device |
|
Tensor Engine |
|
Scalar Engine |
|
Vector Engine |
|
GpSimd Engine |
|
Collective Communication Engine |
|
High Bandwidth Memory |
|
State Buffer |
|
Partial Sum Buffer |
|
Float32 |
|
TensorFloat32 |
|
Float16 |
|
Bfloat16 |
|
Configurable Float8 |
|
Round Nearest Even |
|
Stochastic Rounding |
|
Neuron Kernel Interface |
|
Custom Operators |
|
Neuron Runtime |
|
Data Parallel |
|
Data Parallel degree |
|
Tensor Parallel |
|
Tensor Parallel degree |
|
Pipeline Parallel |
|
Pipeline Parallel degree |
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2