This document is relevant for: Inf1, Inf2, Trn1, Trn1n

Neuron Packager User Guide#

Overview#

neuron-packager is a tool to inspect the contents of a NEFF.

Using neuron-packager#

neuron-packager CLI

neuron-packager [options] [subcommand] [subcommand-options]#

Available subcommands include create, info, optimize, and unpack. See individual subcommand for corresponding options.

  • -v, --version: show version and exit

neuron-packager create [create-options]#

Packages a NEFF from a tarball containing all NEFF files.

  • -i, --input (string): input NEFF tarball

  • -v, --tool-version (string) default=<tool version>: packaging version number

  • -k, --neff-version (string) default=0.4: NEFF version number (Maj.Min)

  • -r, --header-version (int) default=2: NEFF header version

  • -n, --name (string): name of the compiled model

  • -t, --num-nc (int) default=1: number of NeuronnCores required to run the graph

  • -o, --output (string) default=NN: output file name (<output>.neff)

  • -e, --enable-feature (string): supported neff features - choose any combination of [ bin-weights | collectives-offset | custom-ops | coalesced-cc] by passing this flag multiple times

    • bin-weights: supports binary weight files instead of numpy files

    • collectives-offset: supports specifying an offset for collective operations in the compiled Neuron instruction binary

    • custom-ops: supports custom operators

    • coalesced-cc: supports coalesced collective compute operations in NEFF interface

neuron-packager info [info-options]#

Displays the NEFF header as well as information on NeuronCore subgraphs and CPU operators

  • -w, --show-weights: show weights

  • -s, --show-spills: show spills

  • --json-output: dump JSON formatted output

neuron-packager optimize [optimize-options] [neff-files...]#

Optimizes the NEFF for faster loading

  • --keep-debug: keep debug information and prettified JSONs

neuron-packager unpack [unpack-options] [neff-file]#

Unpacks the given NEFF file

  • -o, --output (string) default=basename of NEFF file: output directory

Examples#

The examples below use a compiled NEFF from the torch-neuronx MLP tutorial. For more information, please check out Multi-Layer Perceptron Training Tutorial.

The info subcommand displays information about the compiled model, such as the number of NeuronCores necessary to run and the inputs and outputs of each NeuronCore subgraph and CPU operator (if applicable)

$ neuron-packager info MODULE_0_SyncTensorsGraph.305_16554925436865022292_ip-172-31-55-249-6c54106d-25758-5f5ddf7b170ab.neff
NEFF Header:
      Package Version:               2
      Header Size:                   1024      (bytes)
      Data Size:                     48924     (bytes)
      Major Version:                 1
      Minor Version:                 0
      Build Version:
      Number of Neuron cores:        1
      Hash:                          ec4b1b1fa8919a9be2c176fd63269511
      UUID:                          c3275f90b87d11ed80b60e6b4183ae7f
      Network Name:                  compiler_cache/neuron-compile-cache/USER_neuroncc-2.4.0.21+b7621be18/MODULE_16554925436865022292/MODULE_0_SyncTensorsGraph.305_16554925436865022292_ip-172-31-55-249-6c54106d-25758-5f5ddf7b170ab/2afbe37c-93ff-4f66-88d0-0e8e5d98e497/MODULE_0_SyncTensorsGraph
      Enabled Features:              N/A


NEFF Nodes:
    NODE      Executor    Name    Variable    Size    Type    Format     Shape    DataType    TimeSeries
       9    NeuronCore    sg00
                                    input0       4      IN         N       [1]     float32
                                    input1      20      IN         N       [5]     float32
                                    input2     200      IN        NC    [5,10]     float32
                                    input3      40      IN        NC    [1,10]     float32
                                    input4     100      IN        NC     [5,5]     float32
                                    input5      20      IN         N       [5]     float32
                                    input6      40      IN        NC     [2,5]     float32
                                    input7       8      IN         N       [2]     float32
                                    input8       8      IN         N       [2]       int32
                                   output0     200     OUT        NC    [5,10]     float32    false
                                   output1      20     OUT         N       [5]     float32    false
                                  output10      40     OUT        NC     [2,5]     float32    false
                                  output11      20     OUT         N       [5]     float32    false
                                  output12     100     OUT        NC     [5,5]     float32    false
                                  output13      20     OUT         N       [5]     float32    false
                                  output14     200     OUT        NC    [5,10]     float32    false
                                   output2     100     OUT        NC     [5,5]     float32    false
                                   output3      20     OUT         N       [5]     float32    false
                                   output4      40     OUT        NC     [2,5]     float32    false
                                   output5       8     OUT         N       [2]     float32    false
                                   output6      40     OUT        NC    [1,10]     float32    false
                                   output7       8     OUT         N       [2]       int32    false
                                   output8       4     OUT         N       [1]     float32    false
                                   output9       8     OUT         N       [2]     float32    false

To inspect the contents of the NEFF, use the unpack subcommand.

$ neuron-packager unpack MODULE_0_SyncTensorsGraph.305_16554925436865022292_ip-172-31-55-249-6c54106d-25758-5f5ddf7b170ab.neff
Unpacking NEFF in "MODULE_0_SyncTensorsGraph.305_16554925436865022292_ip-172-31-55-249-6c54106d-25758-5f5ddf7b170ab" directory...

$ ls -l MODULE_0_SyncTensorsGraph.305_16554925436865022292_ip-172-31-55-249-6c54106d-25758-5f5ddf7b170ab/
total 84
drwxr-xr-x 2 ubuntu ubuntu  4096 Mar  1 22:35 debug_info
-rw-rw-r-- 1 ubuntu ubuntu   249 Mar  1 22:35 hlo_stats.json
-rw-rw-r-- 1 ubuntu ubuntu  1205 Mar  1 22:35 info.json
-rw-rw-r-- 1 ubuntu ubuntu   161 Mar  1 22:35 kelf-0.json
-rw-rw-r-- 1 ubuntu ubuntu   366 Mar  1 22:35 metrics.json
-rw-rw-r-- 1 ubuntu ubuntu 10082 Mar  1 22:35 neff.json
-rw-r--r-- 1 ubuntu ubuntu 48924 Mar  1 22:35 neff.tgz
drwxr-xr-x 2 ubuntu ubuntu  4096 Mar  1 22:35 sg00

The top level directory contains the high level information about the model, such as inputs and outputs for NeuronCore subgraphs and CPU operators. Each NeuronCore subgraph has it’s own subdirectory containing Neuron machine instructions, tensor information, model parameters, and other components that support the subgraph’s execution.

Re-packaging the neff can be done through the create subcommand. It takes a tarball of the NEFF contents and appends a header to it. After unpacking the NEFF, this tarball will already be present as neff.tgz. For NEFF versions greater than 2.0, feature bits can be used to indicate which features must be supported by the Neuron runtime in order to be executed. Any incompatible NEFFs will be rejected when attempting to load the model.

$ neuron-packager create -i neff.tgz Successfully generated: NN.neff

The optimize subcommand takes an input NEFF and replaces it with a version optimized for model load time. When invoked, any weights will be combined into a single file when possible, debug information will be removed, and all other necessary files will be modified to reflect the previous changes. In addition, the resulting NEFF will not be compressed, so the NEFF size may increase.

$ neuron-packager optimize opt.neff

Successfully generated: opt.neff

Note

Since optimize removes debug information, some Neuron tools output may be missing information. For example, neuron-packager info will still display tensor sizes, but the shapes will be unknown. To keep the debug info, use the --keep-debug option.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n