.. _ref-mxnet-neuron-compilation-python-api:

Neuron Apache MXNet Compilation Python API
=======================================================

The MXNet-Neuron compilation Python API provides a method to compile
model graph for execution on Inferentia.


Description
-----------

Within the graph or subgraph, the compile method selects and sends
Neuron-supported operations to Neuron-Compiler for compilation and saves
the compiled artifacts in the graph. Uncompilable operations are kept as
original operations for framework execution.

The compiled graph can be saved using the MXNet save_checkpoint and
served using MXNet Model Serving. Please see
:ref:`mxnet-neuron-model-serving` for more information about exporting
to saved model and serving using MXNet Model Serving.

Options can be passed to Neuron compiler via the compile function. For
example, the “\ ``--neuroncore-pipeline-cores``\ ” option directs Neuron compiler
to compile each subgraph to fit in the specified number of NeuronCores.
This number can be less than the total available NeuronCores on an Inf1
instance. See :ref:`neuron-compiler-cli-reference` for more information
about compiler options.

For debugging compilation, use SUBGRAPH_INFO=1 environment setting before
calling the compilation script. The extract subgraphs are preserved as hidden
files in the run directory. For more information, see :ref:`neuron_gatherinfo`

**MXNet 1.5**
-------------

Method
------


.. code:: python

  from mxnet.contrib import neuron
  neuron.compile(sym, args, aux, inputs, **compile_args)


Arguments
---------

-  **sym** - Symbol object loaded from symbol.json file
-  **args** - args/params dictionary loaded from params file
-  **aux** - aux/params dictionary loaded from params file
-  **inputs** - a dictionary with key/value mappings for input name to
   input numpy arrays
-  **kwargs** (optional) - a dictionary with key/value mappings for
   MXNet-Neuron compilation and Neuron Compiler options.

   -  For example, to limit the number of NeuronCores per subgraph, use
      ``compile_args={'--neuroncore-pipeline-cores' : N}`` where N is an integer
      representing the maximum number of NeuronCores per subgraph.
   -  Additional compiler flags can be passed using
      ``'flags' : [<flags>]`` where is a comma separated list of
      strings. See :ref:`neuron_gatherinfo` for example of passing debug
      flags to compiler.
   -  Advanced option to exclude node names:
      ``compile_args={'excl_node_names' : [<node names>]}`` where is a
      comma separated list of node name strings.

Returns
-------

-  **sym** - new partitioned symbol
-  **args** - modified args/params
-  **auxs** - modified aux/params

Example Usage: Compilation
--------------------------

The following is an example usage of the compilation, with default
compilation arguments:


.. code:: python

  from mxnet.contrib import neuron
  ...
  neuron.compile(sym, args, aux, inputs={'data' : img})



**MXNet 1.8**
-------------


Method
------


.. code:: python

  import mx_neuron as neuron
  neuron.compile(obj, args=None, aux=None, inputs=None, **compile_args)


Arguments
---------

-  **obj** - Symbol object loaded from symbol.json file or gluon.HybridBlock object
-  **args** (optional) - args/params dictionary loaded from params file. Only needed in case of Symbol object
-  **aux** (optional) - aux/params dictionary loaded from params file. Only needed in case of Symbol object
-  **inputs** - a dictionary with key/value mappings for input name to
   input numpy arrays.
-  **kwargs** (optional) - a dictionary with key/value mappings for
   MXNet-Neuron compilation and Neuron Compiler options.

   -  For example, to limit the number of NeuronCores per subgraph, use
      ``compile_args={'--neuroncore-pipeline-cores' : N}`` where N is an integer
      representing the maximum number of NeuronCores per subgraph.
   -  Additional compiler flags can be passed using
      ``'flags' : [<flags>]`` where is a comma separated list of
      strings. See :ref:`neuron_gatherinfo` for example of passing debug
      flags to compiler.
   -  Advanced option to exclude node names:
      ``compile_args={'excl_node_names' : [<node names>]}`` where is a
      comma separated list of node name strings.
   -  work_dir:  relative or absolute path for storing compiler artifacts (including params and jsons) generated 
      during compilation when SUBGRAPH_INFO=1.

Returns
-------
- **(sym, args, auxs)** - for symbol object as input. sym, args and auxs are new partitioned symbol, modified args/params and modified aux/params repectively.
- **(obj)** - for gluon.HybridBlock object as input. obj is the parititioned and optimized gluon.Hybrid block object for Neuron backend.


Example Usage: Compilation
--------------------------

The following is an example usage of the compilation, with default
compilation arguments for symbol object:


.. code:: python

  import mx_neuron as neuron
  ...  
  neuron.compile(sym, args, aux, inputs={'data' : img})


The following is an example usage of the compilation, with default
compilation arguments for gluon.HybridBlock object (only supported in MXNet-Neuron 1.8):

.. code:: python

  import mx_neuron as neuron
  ...  
  neuron.compile(obj, inputs={'data' : img})


Example Usage: Extract Compilation Statistics
---------------------------------------------

To extract operation counts, insert the following code after compile
step (assume csym is the compiled MXNet symbol):

.. code:: python

   import json

   # Return list of nodes from MXNet symbol
   def sym_nodes(sym):
     return json.loads(sym.tojson())['nodes']

   # Return number of operations in node list  
   def count_ops(graph_nodes):
     return len([x['op'] for x in graph_nodes if x['op'] != 'null'])

   # Return triplet of compile statistics
   # - count of operations in symbol database
   # - number of Neuron subgraphs
   # - number of operations compiled to Neuron runtime  
   def get_compile_stats(sym):
     cnt = count_ops(sym_nodes(sym))
     neuron_subgraph_cnt = 0
     neuron_compiled_cnt = 0
     for g in sym_nodes(sym):
       if g['op'] == '_neuron_subgraph_op':
         neuron_subgraph_cnt += 1
         for sg in g['subgraphs']:
           neuron_compiled_cnt += count_ops(sg['nodes'])
     return (cnt, neuron_subgraph_cnt, neuron_compiled_cnt)

   original_cnt = count_ops(sym_nodes(sym))
   post_compile_cnt, neuron_subgraph_cnt, neuron_compiled_cnt = get_compile_stats(csym)
   print("INFO:mxnet: Number of operations in original model: ", original_cnt)
   print("INFO:mxnet: Number of operations in compiled model: ", post_compile_cnt)
   print("INFO:mxnet: Number of Neuron subgraphs in compiled model: ", neuron_subgraph_cnt)
   print("INFO:mxnet: Number of operations placed on Neuron runtime: ", neuron_compiled_cnt)

.. code:: bash

   INFO:mxnet: Number of operations in original model:  67
   INFO:mxnet: Number of operations in compiled model:  4
   INFO:mxnet: Number of Neuron subgraphs in compiled model:  2
   INFO:mxnet: Number of operations placed on Neuron runtime:  65
