This document is relevant for: Inf1

Neuron Features#

Neuron features provide insights into Neuron capabilities that enable high-performance and improve usability of developing and deploying deep learning acceleration on top of Inferentia and Trainium based instances.

Collective communication

High-performance communication primitives for distributed training and inference across multiple devices.

Custom C++ operators

Framework for implementing custom operators in C++ to extend Neuron’s built-in operation support.

Data types

Supported numerical data types including FP32, FP16, BF16, and INT8 for efficient model execution.

Logical NeuronCore configuration

Configuration options for grouping and managing NeuronCores as logical units for workload distribution.

Neuron persistent cache

Persistent caching system for compiled models to reduce compilation time across sessions.

NeuronCore batching

Batching strategies to maximize throughput by processing multiple inputs simultaneously on NeuronCores.

NeuronCore pipeline

Pipeline execution model that overlaps computation and data movement for improved performance.

Rounding modes

Configurable numerical rounding modes for controlling precision and accuracy in computations.

This document is relevant for: Inf1