This document is relevant for: Inf1
TensorFlow 2.x (tensorflow-neuron
) Tracing API#
The Neuron tracing API enables tracing TensorFlow 2.x models for deployment on AWS Machine Learning Accelerators.
Method#
tensorflow.neuron.trace
Description#
Trace a keras.Model
or a Python callable that can be decorated by
tf.function
, and return an AWS-Neuron-optimized keras.Model
that
can execute on AWS Machine Learning Accelerators. Tracing is ideal for
keras.Model
that accepts a list of tf.Tensor
objects and returns
a list of tf.Tensor
objects. It is expected that users will provide
example inputs, and the trace
function will execute func
symbolically and convert it to a keras.Model
.
The returned keras.Model
will support inference only. Attributes or
variables held by the original function or keras.Model
will be dropped.
The returned keras.Model
can be exported as SavedModel and served using
TensorFlow Serving. Please see tensorflow-serving for more
information about exporting to saved model and serving using TensorFlow
Serving.
The returned keras.Model
has an .on_neuron_ratio
attribute
which shows the percentage of ops mapped to neuron hardware. This calculation
ignores PlaceholerOp, IdentityOp, ReadVariableOp and NoOp.
Options can be passed to Neuron compiler via the environment variable
NEURON_CC_FLAGS
. For example, the syntax
env NEURON_CC_FLAGS="--neuroncore-pipeline-cores=4"
directs Neuron
compiler to compile each subgraph to fit in the specified number of
NeuronCores. This number can be less than the total available NeuronCores
on an Inf1 instance. See Neuron compiler CLI Reference Guide (neuron-cc) for more
information about compiler options.
Arguments#
func: The
keras.Model
or function to be traced.example_inputs: A
tf.Tensor
or a tuple/list/dict oftf.Tensor
objects for tracing the function. Whenexample_inputs
is atf.Tensor
or a list oftf.Tensor
objects, we expectfunc
to have calling signaturefunc(example_inputs)
. Otherwise, the expectation is that inference onfunc
is done by callingfunc(*example_inputs)
whenexample_inputs
is atuple
, orfunc(**example_inputs)
whenexample_inputs
is adict
. The case wherefunc
accepts mixed positional and keyword arguments is currently unsupported.subgraph_builder_function: (Optional) A callable with signature
subgraph_builder_function(node : NodeDef) -> bool
(NodeDef
is defined in tensorflow/core/framework/node_def.proto)that is used as a call-back function to determine which part of the tensorflow GraphDef given by tracing
func
will be placed on Machine Learning Accelerators.If
subgraph_builder_function
is not provided, thentrace
will automatically place operations on Machine Learning Accelerators or on CPU to maximize the execution efficiency.If it is provided, and
subgraph_builder_function(node)
returnsTrue
, and placingnode
on Machine Learning Accelerators will not cause deadlocks during execution, thentrace
will placenode
on Machine Learning Accelerators. Ifsubgraph_builder_function(node)
returnsFalse
, thentrace
will placenode
on CPU.
Special Flags#
These are flags that get passed directly to the Neuron tracing API
(rather than the Neuron Compiler). The flags are still passed
via the environment variable NEURON_CC_FLAGS
.
workdir: example usage -
NEURON_CC_FLAGS='--workdir ./artifacts'
will create a folder named artifacts in the current directory and save artifacts that can be used for debug.dynamic-batch-size: example usage -
NEURON_CC_FLAGS='--dynamic-batch-size'
A flag to allow Neuron graphs to consume variable sized batches of data. Dynamic sizing is restricted to the 0th dimension of a tensor.extract-weights (Beta): example usage -
NEURON_CC_FLAGS='--extract-weights inf1.2xlarge'
will reduce the compiled model’s protobuf size by taking the weights out of the protobuf. Useful for compiling large models that would exceed the 2GB protobuf size limit. This feature is in beta. Model performance is not guaranteed and the flag does not work in combination with--neuroncore-pipeline-cores
,--dynamic-batch-size
, models with multiple NEFFs, and models that are 4GB or greater. Compiles models for different neuron instances depending on the instance type passed. Supports all inf1 instance types.
Returns#
An AWS-Neuron-optimized
keras.Model
.
Example Usage#
import tensorflow as tf
import tensorflow.neuron as tfn
input0 = tf.keras.layers.Input(3)
dense0 = tf.keras.layers.Dense(3)(input0)
model = tf.keras.Model(inputs=[input0], outputs=[dense0])
example_inputs = tf.random.uniform([1, 3])
model_neuron = tfn.trace(model, example_inputs) # trace
# check to see how much of the model was compiled successfully
print(model_neuron.on_neuron_ratio)
model_dir = './model_neuron'
model_neuron.save(model_dir)
model_neuron_reloaded = tf.keras.models.load_model(model_dir)
Example Usage with Manual Device Placement Using subgraph_builder_function#
import tensorflow as tf
import tensorflow.neuron as tfn
input0 = tf.keras.layers.Input(3)
dense0 = tf.keras.layers.Dense(3)(input0)
reshape0 = tf.keras.layers.Reshape([1, 3])(dense0)
output0 = tf.keras.layers.Dense(2)(reshape0)
model = tf.keras.Model(inputs=[input0], outputs=[output0])
example_inputs = tf.random.uniform([1, 3])
def subgraph_builder_function(node):
return node.op == 'MatMul'
model_neuron = tfn.trace(
model, example_inputs,
subgraph_builder_function=subgraph_builder_function,
)
Important
Although the old API tensorflow.neuron.saved_model.compile
is still available under tensorflow-neuron 2.x,
it supports only the limited capabilities of tensorflow.neuron.trace
and will be deprecated in future releases.
This document is relevant for: Inf1