This document is relevant for: Inf1

TensorFlow 1.x (tensorflow-neuron) Compilation API#

The Neuron compilation API for TensorFlow 1.x enables compilation of saved model to an Inferentia target.

Method#

tensorflow.neuron.saved_model.compile

Description#

Within the graph or subgraph, the compile method selects and send Neuron-supported operations to Neuron-Compiler for compilation and saves the compiled artifacts in the graph. Uncompilable operations are kept as original operations for framework execution.

The compiled graph can be exported to saved model and served using TensorFlow Serving. Please see tensorflow-serving for more information about exporting to saved model and serving using TensorFlow Serving.

Options can be passed to Neuron compiler via the compile function. For example, the “--neuroncore-pipeline-cores” option directs Neuron compiler to compile each subgraph to fit in the specified number of NeuronCores. This number can be less than the total available NeuronCores on an Inf1 instance. See Neuron compiler CLI Reference Guide (neuron-cc) for more information about compiler options.

Arguments#

  • model_dir: The path of the original SavedModel.

  • new_model_dir: The path to which the Neuron-optimized SavedModel will be stored.

  • batch_size: (Optional) Positive integer representing batch size used in inference. The default value is 1.

  • model_shape_feed_dict: (Optional) Dictionary {str: list} used for inferring tensor shapes. Keys should match model input names. Values are lists of positive integers representing model input tensor shapes.

  • model_feed_dict: (Optional) Dictionary {str: numpy.array} used for inference. Useful for inferring tensor shapes. Keys should match model input names. Values are numpy arrays that can be fed as inputs to the SavedModel.

  • tags: (Optional) Iterable of strings to identify the required MetaGraphDef. These should correspond to the tags used when saving the variables using the SavedModel save() API. Default is to use the first tag_set available in the SavedModel.

  • signature_def_key: (Optional) String specifying the signature_def to use. Default is to use ‘serving_default’ or the first signature_def corresponding to tags.

  • minimum_segment_size: (Optional) Integer indicating the minimum number of operations in an NeuronOp.

  • no_fuse_ops: (Optional) None or iterable of strings (unordered) representing names of operations that are forcibly placed on CPU.

  • compiler_args: (Optional) List of strings representing neuron-cc compiler arguments. Note that these arguments apply to all subgraphs generated by whitelist partitioning. For example, use compiler_args=['--neuroncore-pipeline-cores', '4'] to set number of NeuronCores per subgraph to 4. See Neuron compiler CLI Reference Guide (neuron-cc) for more information about compiler options.

  • compiler_workdir: (Optional) String representing work directory of the neuron-cc compiler.

Returns#

  • Dictionary with operator counts before/after optimization.

  • Operator count statistics are displayed to show original count, post-optimization count, and the number placed on Neuron runtime. For example:

INFO:tensorflow:Number of operations in TensorFlow session: 3978
INFO:tensorflow:Number of operations after tf.neuron optimizations: 555
INFO:tensorflow:Number of operations placed on Neuron runtime: 554

Example Usage#

import shutil
import tensorflow.neuron as tfn
saved_model_path = "<saved model path>"
compiled_saved_model_path = "<compiled saved model path>"
shutil.rmtree(compiled_saved_model_path, ignore_errors=True)
tfn.saved_model.compile(saved_model_path, compiled_saved_model_path)

This document is relevant for: Inf1