Neuron Driver Release Notes
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
Neuron Driver Release Notes#
Table of contents
Known issues#
Updated : 04/29/2022
In rare cases of multi-process applications running under heavy stress a model load failure my occur. This may require reloading of the Neuron Driver as a workaround.
Neuron Driver release [2.9.4.0]#
Date: 05/01/2023
New in this release#
Added dma_buf support, which is needed for future EFA implementations in the Linux kernel.
Added new IOCTL to get Neuron Device BDF (used by Neuron Runtime)
Added optional support for sysfs notify (off by default). See Neuron Sysfs documentation (under Neuron System Tools) for more details.
Bug Fixes#
Fixed max DMA queue size constant to be the correct size - previous incorrect sizing had potential to lead to DMA aborts (execution timeout).
Neuron Driver release [2.8.4.0]#
Date: 03/28/2023
New in this release#
Supports both Trn1n and Inf2 instance types.
Renamed NEURON_ARCH_INFERENTIA=>NEURON_ARCH_V1 and NEURON_ARCH_TRN=>NEURON_ARCH_V2
Under sysfs nodes, the following changes were made:
Changed “infer” metrics to “execute” metrics
Added peak memory usage metric
Removed empty dynamic metrics directory
Removed refresh rate metric
Fixed arch type names in sysfs
Bug Fixes#
Fixed minor memory leak when closing the Neuron Runtime.
Fixed memory leaks on error paths in Neuron Driver.
Added a workaround to resolve hangs when NeuronCore reset is ran while another core is performing DMA operations.
Neuron Driver release [2.7.33.0]#
Date: 02/24/2023
Bug Fixes#
Added a retry mechanism to mitigate possible data copy failures during reset of a NeuronCore. An info log message will be emitted when this occurs indicating that the retry was attempted. An example:
kernel: [726415.485022] neuron:ndma_memcpy_wait_for_completion: DMA completion timeout for UDMA_ENG_33 q0 kernel: [726415.491744] neuron:ndma_memcpy_offset_move: Failed to copy memory during a NeuronCore reset: nd 0, src 0x100154480000, dst 0x100154500000, size 523264. Retrying the copy.
Neuron Driver release [2.7.15.0]#
Date: 02/08/2023
New in this release#
Added Neuron sysfs metrics under
/sys/devices/virtual/neuron_device/neuron{0,1, ...}/metrics/
Neuron Driver release [2.6.26.0]#
Date: 11/07/2022
New in this release#
Minor bug fixes and improvements.
Neuron Driver release [2.5.38.0]#
Neuron Driver now supports INF1 and TRN1 EC2 instance types. Name of the driver package changed from aws-neuron-dkms to aws-neuronx-dkms. Please remove the older driver package before installing the newest one.
Date: 10/10/2022
New in this release#
Support added for EC2 Trn1 instance types and ML training workloads.
Added missing GPL2 LICENSE file.
Changed package name to aws-neuronx-dkms (was previously minus the ‘x’).
Security Update – blocked user space access to control registers and DMA control queues intended to be used by the Neuron Driver only.
Added support for DMA Aborts to avoid hangs.
Added support for TPB Reset.
Added sysfs entries for triggering resets and reading core counts.
Added write combining on BAR4.
Added PCI Device ID update as part of install.
Added handling for known duplicate device id error.
Bug Fixes#
Fixed a null pointer free scenario.
Fixed installation issue related to install without internet connectivity.
Neuron Driver release [2.3.26.0]#
Date: 08/02/2022
Bug Fixes#
Security Update: Blocked user space access to control registers and DMA control queues intended to be used by the Neuron Driver only. Recommending upgrade to all customers.
Neuron Driver release [2.3.11.0]#
Date: 05/27/2022
New in this release#
This driver is required to support future releases of the Neuron Runtime. Included in the release is both a bug fix to avoid a kernel crash scenario and an increased compatibility range to ensure compatibility with future versions of Neuron Runtime.
Bug Fixes#
Correction to huge aligned memory allocation/freeing logic that was previously susceptible to crashes in the kernel. The crash would bring down the OS. Recommending upgrade to all customers.
Neuron Driver release [2.3.3.0]#
Date: 04/29/2022
New in this release#
Minor performance improvements on inference and loading of models.
Bug Fixes#
Reduced Host CPU usage when reading
hw_counters
metric from neuron-monitorMinor bug fixes.
Neuron Driver release [2.2.6.0]#
Date: 10/27/2021
New in this release#
Memory improvements made to ensure all allocations are made with 4K alignments.
Resolved issues#
No longer delays 1s per NeuronDevice when closing Neuron Tools applications.
Fixes a Ubuntu 20 build issue
Neuron Driver release [2.1]#
Support is added for Neuron Runtime 2.x (
libnrt.so
).Support for previous releases of Neuron Runtime 1.x is continued with Driver 2.x releases.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n