Neuron System Tools
Contents
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
Neuron System Tools#
Table of Contents
Neuron Tools [2.10.1.0]#
Date: 05/01/2023
New in the release:
Added new Neuron Collectives benchmarking tool,
nccom-test
, to enable benchmarking sweeps on various Neuron Collective Communication operations. See new nccom-test documentation under System Tools for more details.Expanded support for Neuron profiling to include runtime setup/teardown times and collapsed execution of NeuronCore engines and DMA. See Tensorboard release notes and tutorial for more details.
Neuron Tools [2.9.5.0]#
Date: 03/28/2023
New in the release:
Updated neuron-top to show effective FLOPs across all NeuronCores.
Neuron Tools [2.8.2.0]#
Date: 02/24/2023
New in the release:
Updated neuron-top to show aggregated utilization/FLOPs across all NeuronCores.
Neuron Tools [2.7.2.0]#
Date: 02/08/2023
New in the release:
Added support for model FLOPS metrics in both neuron-monitor and neuron-top. More details can be found in the Neuron Tools documentation.
Neuron Tools [2.6.0.0]#
Date: 12/09/2022
This release adds support for profiling with the Neuron Plugin for TensorBoard on TRN1. Please check out the documentation Neuron Plugin for TensorBoard (Trn1).
New in the release:
Updated profile post-processing for workloads executed on TRN1
Neuron Tools [2.5.16.0]#
Date: 10/26/2022
New in the release:
New
neuron-monitor
andneuron-top
feature: memory utilization breakdown. This new feature provides more details on how memory is being currently used on the Neuron Devices as well as on the host instance.neuron-top
’s UI layout has been updated to accommodate the new memory utilization breakdown feature.neuron-monitor
’sinference_stats
metric group was renamed toexecution_stats
. While the previous release still supportedinference_stats
, starting this release the nameinference_stats
is considered deprecated and can’t be used anymore.
Note
For more details on the new memory utilization breakdown feature in neuron-monitor
and neuron-top
check out the full user guides: Neuron Monitor User Guide and Neuron Top User Guide.
Bug Fixes:
Fix a rare crash in
neuron-top
when the instance is under heavy CPU load.Fix process names on the bottom tab bar of
neuron-top
sometimes disappearing for smaller terminal window sizes.
Neuron Tools [2.4.6.0]#
Date: 10/10/2022
This release adds support for both EC2 INF1 and TRN1 platforms. Name of the package changed from aws-neuron-tools to aws-neuronx-tools. Please remove the old package before installing the new one.
New in the release:
Added support for ECC counters on Trn1
Added version number output to neuron-top
Expanded support for longer process tags in neuron-monitor.
Removed hardware counters from the default neuron-monitor config to avoid sending repeated errors - will add back in future release.
neuron-ls
- Added optionneuron-ls --topology
with ASCII graphics output showing the connectivity between Neuron Devices on an instance. This feature aims to help in understanding pathways between Neuron Devices and in exploiting code or data locality.
Bug Fixes:
Fix neuron-monitor and neuron-top to show the correct Neuron Device when running in a container where not all devices are present.
Neuron Tools [2.0.790.0]#
Date: 03/25/2022
neuron-monitor
: fixed a floating point error when calculating CPU utilization.
Neuron Tools [2.0.623.0]#
Date: 01/20/2022
New in the release:
neuron-top
- Added “all” tab that aggregates all aggregate all running Neuron processes into a single view.neuron-top
- Improved startup time to approximately 1.5 seconds in most cases.neuron-ls
- Removed header message about updating tools from neuron-ls output
Bug fixes:
neuron-top
- Reduced single CPU core usage down to 0.7% from 80% on inf1.xlarge when runningneuron-top
by switching to an event-driven approach for screen updates.
Neuron Tools [2.0.494.0]#
Date: 12/27/2021
Security related updates related to log4j vulnerabilities.
Neuron Tools [2.0.327.0]#
Date: 11/05/2021
Updated Neuron Runtime (which is integrated within this package) to
libnrt 2.2.18.0
to fix a container issue that was preventing the use of containers when /dev/neuron0 was not present. See details here neuron-runtime-release-notes.
Neuron Tools [2.0.277.0]#
Date: 10/27/2021
New in this release:
Tools now support applications built with Neuron Runtime 2.x (
libnrt.so
).Important
You must update to the latest Neuron Driver (
aws-neuron-dkms
version 2.1 or newer) for proper functionality of the new runtime library.Read Introducing Neuron Runtime 2.x (libnrt.so) application note that describes why are we making this change and how this change will affect the Neuron SDK in detail.
Read Migrate your application to Neuron Runtime 2.x (libnrt.so) for detailed information of how to migrate your application.
Updates have been made to
neuron-ls
andneuron-top
to significantly improve the interface and utility of information provided.Expands
neuron-monitor
to include additional information when used to monitor latest Frameworks released with Neuron 1.16.0.neuron_hardware_info Contains basic information about the Neuron hardware.
"neuron_hardware_info": { "neuron_device_count": 16, "neuroncore_per_device_count": 4, "error": "" }
neuron_device_count
: number of available Neuron Devices
neuroncore_per_device_count
: number of NeuronCores present on each Neuron Device
error
: will contain an error string if any occurred when getting this information (usually due to the Neuron Driver not being installed or not running).
neuron-cli
entering maintenance mode as it’s use is no longer relevant when using ML Frameworks with an integrated Neuron Runtime (libnrt.so). see 10/27/2021 - Neuron support for Apache MXNet 1.5 enters maintenance mode for more information.For more information visit Neuron Tools
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n