.. _tfneuronx-ref-neuron-tracing-api:

TensorFlow 2.x (``tensorflow-neuronx``) Tracing API
============================================

The Neuron tracing API enables tracing TensorFlow 2.x models for deployment
on trn1 and inf2 AWS machine learning accelerators.

Method
------

``tensorflow_neuronx.trace``

Description
-----------

Trace a ``keras.Model`` or a Python callable that can be decorated by
``tf.function``, and return an AWS-Neuron-optimized ``keras.Model`` that
can execute on trn1 and inf2 AWS machine learning accelerators. Tracing is
ideal for ``keras.Model`` that accepts a list of ``tf.Tensor`` objects and
returns a list of ``tf.Tensor`` objects. It is expected that users will
provide example inputs, and the ``trace`` function will execute ``func``
symbolically and convert it to a ``keras.Model``.

The returned ``keras.Model`` will support inference only. Attributes or
variables held by the original function or ``keras.Model`` will be dropped.

The returned ``keras.Model`` can be exported as SavedModel and served using
TensorFlow Serving. Please see :ref:`tensorflow-serving` for more
information about exporting to saved model and serving using TensorFlow
Serving.

The returned ``keras.Model`` has an ``.on_neuron_ratio`` attribute
which shows the percentage of ops mapped to neuron hardware. This calculation
ignores PlaceholerOp, IdentityOp, ReadVariableOp and NoOp.

Options can be passed to Neuron compiler via the environment variable
``NEURON_CC_FLAGS``. For example, the syntax
``env NEURON_CC_FLAGS="--workdir ./artifacts"`` directs the Neuron compiler to dump artifacts
in the artifacts directory for debugging. See :ref:`neuron-compiler-cli-reference-guide` for more
information about compiler options.

Arguments
---------

-   **func:** The ``keras.Model`` or function to be traced.
-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of
    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``
    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect
    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,
    the expectation is that inference on ``func`` is done by calling
    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,
    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.
    The case where ``func`` accepts mixed positional and keyword arguments
    is currently unsupported.
-   **subgraph_builder_function:** (Optional) A callable with signature

    ``subgraph_builder_function(node : NodeDef) -> bool``
    (``NodeDef`` is defined in tensorflow/core/framework/node_def.proto)

    that is used as a call-back function to determine which part of
    the tensorflow GraphDef given by tracing ``func`` will be placed on
    Machine Learning Accelerators.

    If ``subgraph_builder_function`` is not provided, then ``trace`` will
    automatically place operations on Machine Learning Accelerators or
    on CPU to maximize the execution efficiency.

    If it is provided, and ``subgraph_builder_function(node)`` returns
    ``True``, and placing ``node`` on Machine Learning Accelerators
    will not cause deadlocks during execution, then ``trace`` will place
    ``node`` on Machine Learning Accelerators. If
    ``subgraph_builder_function(node)`` returns ``False``, then ``trace``
    will place ``node`` on CPU.

.. _tensorflow-neuronx-special-flags:

Special Flags
-------------

These are flags that get passed directly to the Neuron tracing API
(rather than the Neuron Compiler). The flags are still passed
via the environment variable ``NEURON_CC_FLAGS``.

-   **workdir:** example usage - ``NEURON_CC_FLAGS='--workdir ./artifacts'``
    will create a folder named artifacts in the current directory and
    save artifacts that can be used for debug.
-   **dynamic-batch-size:** example usage -
    ``NEURON_CC_FLAGS='--dynamic-batch-size'`` A flag to allow Neuron graphs to
    consume variable sized batches of data. Dynamic sizing is restricted to the
    0th dimension of a tensor.
-   **extract-weights (Beta):** example usage - 
    ``NEURON_CC_FLAGS='--extract-weights trn1.2xlarge'`` will reduce the compiled
    model's protobuf size by taking the weights out of the protobuf.
    Useful for compiling large models that would exceed the 2GB protobuf
    size limit. This feature is in beta. Model performance is not
    guaranteed and the flag does not work in combination with
    ``--neuroncore-pipeline-cores``, ``--dynamic-batch-size``, models with
    multiple NEFFs, and models that are 16GB or greater. 
    Compiles models for different neuron instances depending on the instance type passed.
    Supports all trn1 and inf2 instance types except for trn1n.

Returns
-------

-  An AWS-Neuron-optimized ``keras.Model``.


Example Usage
-------------

.. code:: python

    import tensorflow as tf
    import tensorflow_neuronx as tfnx

    input0 = tf.keras.layers.Input(3)
    dense0 = tf.keras.layers.Dense(3)(input0)
    model = tf.keras.Model(inputs=[input0], outputs=[dense0])
    example_inputs = tf.random.uniform([1, 3])
    model_neuron = tfnx.trace(model, example_inputs)  # trace
    # check to see how much of the model was compiled successfully
    print(model_neuron.on_neuron_ratio) 

    model_dir = './model_neuron'
    model_neuron.save(model_dir)
    model_neuron_reloaded = tf.keras.models.load_model(model_dir)


Example Usage with Manual Device Placement Using `subgraph_builder_function`
-------------

.. code:: python

    import tensorflow as tf
    import tensorflow_neuronx as tfnx

    input0 = tf.keras.layers.Input(3)
    dense0 = tf.keras.layers.Dense(3)(input0)
    reshape0 = tf.keras.layers.Reshape([1, 3])(dense0)
    output0 = tf.keras.layers.Dense(2)(reshape0)
    model = tf.keras.Model(inputs=[input0], outputs=[output0])
    example_inputs = tf.random.uniform([1, 3])

    def subgraph_builder_function(node):
        return node.op == 'MatMul'

    model_neuron = tfnx.trace(
        model, example_inputs,
        subgraph_builder_function=subgraph_builder_function,
    )