.. _tf-neuronx-ref-auto-replication-python-api:

TensorFlow 2.x (``tensorflow-neuronx``) Auto Multicore Replication (Beta)
===========================================================================

The Neuron auto multicore replication Python API enables modifying TensorFlow 2.x
models trace by ```tensorflow_neuronx.trace``` so that they can be automatically replicated across multiple cores.

.. contents:: Table of contents
   :local:
   :depth: 1

TensorFlow 2.x (``tensorflow-neuron TF2.x``) Auto Multicore Replication Python API (Beta)
-------------------------------------------------------------------------------------------

Method
^^^^^^

``tensorflow.neuron.auto_multicore``
on models traced by
``tensorflow_neuronx.trace``

Description
^^^^^^^^^^^

Converts an existing AWS-Neuron-optimized ``keras.Model`` and returns an auto-replication tagged
AWS-Multicore-Neuron-optimized  ``keras.Model`` that can execute on AWS Machine Learning Accelerators.
Like the traced model, the returned ``keras.Model`` will support inference only. Attributes or
variables held by the original function or ``keras.Model`` will be dropped.

The auto model replication feature in TensorFlow-Neuron enables you to
create a model once and the model parallel replication would happen
automatically. The desired number of cores can be less than the total available NeuronCores
on an trn1 or inf2 instance but not less than 1. This reduces framework memory usage as you are not
loading the same model multiple times manually. Calls to the returned model will execute the call
on each core in a round-robin fashion.

The returned ``keras.Model`` can be exported as SavedModel and served using
TensorFlow Serving. Please see :ref:`tensorflow-serving` for more
information about exporting to saved model and serving using TensorFlow
Serving.

Note that the automatic replication will only work on models compiled with pipeline size 1:
via ``--neuroncore-pipeline-cores=1``. If auto replication is not enabled, the model will default to
replicate on up to 4 cores.

See  :ref:`neuron-compiler-cli-reference-guide` for more information about compiler options.

Arguments
^^^^^^^^^

-   **func:** The ``keras.Model`` or function to be traced.
-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of
    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``
    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect
    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,
    the expectation is that inference on ``func`` is done by calling
    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,
    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.
    The case where ``func`` accepts mixed positional and keyword arguments
    is currently unsupported.
-   **num_cores:** The desired number of cores where the model will be automatically
    replicated across

Returns
^^^^^^^

-  An AWS-Multicore-Neuron-optimized ``keras.Model``.


Example Python API Usage for TF2.x traced models:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code :: python

        import tensorflow as tf
        import tensorflow.neuron as tfn
        import tensorflow_neuronx as tfnx

        input0 = tf.keras.layers.Input(3)
        dense0 = tf.keras.layers.Dense(3)(input0)
        inputs = [input0]
        outputs = [dense0]
        model = tf.keras.Model(inputs=inputs, outputs=outputs)
        input0_tensor = tf.random.uniform([1, 3])
        model_neuron = tfnx.trace(model, input0_tensor)

        # a trn1.2xlarge has 2 neuron cores
        num_cores = 2
        multicore_model = tfn.auto_multicore(model_neuron, input0_tensor, num_cores=num_cores)
        multicore_model(input0_tensor)

Example Python API Usage for TF2.x saved models:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code :: python

        from tensorflow.python import saved_model

        input0_tensor = tf.random.uniform([1, 3])
        num_cores = 4
        reload_model = saved_model.load(model_dir)
        multicore_model = tfn.auto_multicore(reload_model, input0_tensor, num_cores=num_cores)

.. _tensorflow-ref-auto-replication-cli-api:

TensorFlow Neuron TF2.x (``tensorflow-neuronx TF2.x``) Auto Multicore Replication CLI (Beta)
---------------------------------------------------------------------------------------------------------------

The Neuron auto multicore replication CLI  enables modifying Tensorflow 2.x
traced saved models so that they can be automatically replicated across multiple cores. By performing
this call on Tensorflow Saved Models, we can support Tensorflow-Serving
without significant modifications to the code.

Method
^^^^^^

``tf-neuron-auto-multicore MODEL_DIR --num_cores NUM_CORES --new_model_dir NEW_MODEL_DIR``

Arguments
^^^^^^^^^

-   **MODEL_DIR:** The directory of a saved AWS-Neuron-optimized ``keras.Model``.
-   **NUM_CORES:** The desired number of cores where the model will be automatically
    replicated across
-   **NEW_MODEL_DIR:** The directory of where the AWS-Multicore-Neuron-optimized
    ``keras.Model`` will be saved

Example CLI Usage for Tensorflow-Serving saved models:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code :: python

        tf-neuron-auto-multicore ./resnet --num_cores 8 --new_model_dir ./modified_resnet