This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3
Neuron Features
Neuron features provide insights into Neuron capabilities that enable high-performance and improve usability of developing and deploying deep learning acceleration on top of Inferentia and Trainium based instances.
Custom C++ operators
Framework for implementing custom operators in C++ to extend Neuron’s built-in operation support.
Data types
Supported numerical data types including FP32, FP16, BF16, and INT8 for efficient model execution.
Logical NeuronCore configuration
Configuration options for grouping and managing NeuronCores as logical units for workload distribution.
Neuron persistent cache
Persistent caching system for compiled models to reduce compilation time across sessions.
NeuronCore batching
Batching strategies to maximize throughput by processing multiple inputs simultaneously on NeuronCores.
NeuronCore pipeline
Pipeline execution model that overlaps computation and data movement for improved performance.
Rounding modes
Configurable numerical rounding modes for controlling precision and accuracy in computations.
This document is relevant for: Inf1, Inf2, Trn1, Trn2, Trn3