This document is relevant for: Inf1
Using Neuron GatherInfo Tool to collect debug and support information#
Overview#
The Neuron GatherInfo tool neuron-gatherinfo.py
can assist in
automating the collection and packaging of information from Neuron SDK
tools that is useful to both user and AWS for issue resolution. The tool
gathers log files and other system information. If being used to supply
that info to AWS, the tool will redact proprietary and confidential
information. The GatherInfo tool is supplied in source code form -
available here: Neuron Gatherinfo
The tool enables developers to gather compiler and inference/runtime logs. Additionally, the common usage is from within one of the supported ML frameworks that have been integrated with Neuron, and information can be captured from those compile/runtime environments using the frameworks.
Steps Overview:#
Obtain a copy of neuron-gatherinfo.py from Neuron Gatherinfo
Install into a location in your $PATH or into a location from where you can launch the script
Use with compile and/or runtime environments
Neuron-CC information gathering#
Step 1: Re-run the compile steps for your workload with increased verbosity or debug levels#
For TensorFlow-Neuron, change the Python code as shown. Note that ‘compiler-workdir’ is expected to be an empty directory to prevent files from other runs from interfering with the information gathering. The call to the compile function has to be augmented with the verbose and the **compiler_workdir **arguments. In addition, please capture the stdout messages into a file (for example, by redirecting the stdout to a file)
tfn.saved_model.compile(model_dir, compiled_model_dir, compiler_args=['--verbose', '2', '--pipeline', 'compile', 'SaveTemps'], compiler_workdir='./compiler-workdir')
For Neuron Apache MXNet, add compiler arguments as shown below and run the compilation process from an empty workdir:
import mxnet as mx
import os
from packaging import version
mxnet_version = version.parse(mx.__version__)
if mxnet_version >= version.parse("1.8"):
import mx_neuron as neuron
else:
from mxnet.contrib import neuron
...
os.environ['SUBGRAPH_INFO'] = '1'
compile_args = { '--verbose' : 2, '--pipeline' : 'compile', 'flags' : ['SaveTemps'] }
csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs=inputs, **compile_args)
Step 1: EXECUTE inference steps for your workload with increased verbosity or debug levels#
In the case of runtime information, the tool neuron-dump.py is used by **neuron-gatherinfo.py **to gather that information. Make sure that you have the neuron tools package (aws-neuron-tools) installed.
Tool Usage Reference#
Run neuron-gatherinfo.py using the “—help“ option:
bash $ ~/bin/neuron-gatherinfo.py --help
usage: neuron-gatherinfo.py [-h] [--additionalfileordir ADDFLDIR] [-c CCDIR]
[-i] [-f FILTERFILE] [-m] -o OUTDIR [-r RTDIR] -s
STDOUT [-v]
Usage: /home/user/bin/neuron-gatherinfo.py [options]
This program is used to gather information from this system for analysis
and debugging
optional arguments:
-h, --help show this help message and exit
--additionalfileordir ADDFLDIR
Additional file or directory that the user wants to
provide in the archive. The user can sanitize this
file or directory before sharing
-c CCDIR, --compileroutdir CCDIR
Location of the neuron-cc generated files
-i, --include By default, only the lines containing (grep) patterns
like 'nrtd|neuron|kernel:' from the syslog are copied.
Other lines are excluded. Using this option allows the
timestamp section of other lines to be included. The
rest of the contents of the line itself are elided.
Providing the timestamp section may provide time
continuity while viewing the copied syslog file
-f FILTERFILE, --filter FILTERFILE
-m, --modeldata By using this option, the entire compiler work
directory's contents will be included (excluding the
.pb files, unless an additional option is used). This
would include model information, etc. The files that
are included, by default, are these: graph_def.neuron-
cc.log, all_metrics.csv, hh-tr-operand-
tensortensor.json
-o OUTDIR, --out OUTDIR
The output directory where all the files and other
information will be stored. The output will be stored
as an archive as well as the actual directory where
all the contents are copied. This will allow a simple
audit of the files, if necessary. *** N O T E ***:
Make sure that this directory has enough space to hold
the files and resulting archive
-r RTDIR, --runtimeoutdir RTDIR
Location of the neuron runtime generated files
-s STDOUT, --stdout STDOUT
The file where the stdout of the compiler run was
saved
-v, --verbose Verbose mode displays commands executed and any
additional information which may be useful in
debugging the tool itself
Examples#
Example 1: no ML model information gathered (default behavior)#
In this case, the tool will archive just the default information gathering:
bash $ sudo ~/bin/neuron-gatherinfo.py -o compile-and-run-info-for-debugging-no-model-info -i --verbose -s stdout-from-compile_resnet50.out -c compiler-workdir
Running cmd: lscpu and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lscpu.txt
Running cmd: lshw and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lshw.txt
Running cmd: lspci | grep -i Amazon and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lspci.txt
Running cmd: neuron-cc --version and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-cc.txt
Running cmd: neuron-ls and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-ls.txt
<SNIP>
******
Archive created at:
/home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo.tar.gz
From directory:
/home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo
******
Example 2 : model ML information gathered using the “—modeldata” option#
In this case, the tool will archive the compiler work directory in addition to the default information gathering
bash $ sudo ~/bin/neuron-gatherinfo.py -o compile-and-run-info-for-debugging -i --verbose -s stdout-from-compile_resnet50.out -c compiler-workdir --modeldata
<SNIP>
Running cmd: lscpu and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lscpu.txt
Running cmd: lshw and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lshw.txt
Running cmd: lspci | grep -i Amazon and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lspci.txt
Running cmd: neuron-cc --version and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-cc.txt
Running cmd: neuron-ls and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-ls.txt
<SNIP>
******
Archive created at:
/home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo.tar.gz
From directory:
/home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo
******
**************************
Based on your command line option, we're also packaging these files:
graph_def.neuron-cc.log
all_metrics.csv
hh-tr-operand-tensortensor.json
And this directory: /home/user/tutorials-3/compiler-workdir
**************************
This document is relevant for: Inf1