This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2
NeuronPerf Compile Guide#
If you wish to compile multiple configurations at once, NeuronPerf provides a simplified and uniform API across frameworks. The output is a neuronperf_model_index that tracks the artifacts produces, and can be passed directly to the benchmark routine for a streamlined end-to-end process. This may be useful if you wish to test multiple configurations of your model on Neuron hardware.
You can manually specify the model index filename by passing filename
, or let NeuronPerf generate one and return it for you. Compiled artifacts will be placed in a local models
directory.
How does compile
know which instance type to compile for?#
NeuronPerf will assume that the instance type your are currently on is also the compile target. However, you may compile on a non-Neuron instance or choose to target a different instance type. In the case, you can pass compiler_target
to the compile
call.
For example:
import neuronperf as npf
import neuronperf.torch
npf.torch.compile(model, inputs) # compile for current instance type
npf.torch.compile(model, inputs, compiler_target="inf2") # compile for inf2
Compiling multiple variants#
If you provide multiple pipeline sizes, batch sizes, and/or cast modes, NeuronPerf will compile all of them.
# Select a few batch sizes and pipeline configurations to test
batch_sizes = [1, 5, 10]
pipeline_sizes = [1, 2, 4]
# Construct example inputs
example_inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float16) for batch_size in batch_sizes]
# Compile all configurations
index = npf.torch.compile(
model,
example_inputs,
batch_sizes=batch_sizes,
pipeline_sizes=pipeline_sizes,
)
If you wished to benchmark specific subsets of configurations, you could compile the specific configurations independently and later combine the results into a single index, as shown below.
# Compile with pipeline size 1 and vary batch dimension
batch_index = npf.torch.compile(
model,
example_inputs,
batch_sizes=batch_sizes,
pipeline_sizes=1,
)
# Compile with batch size 1 and vary pipeline dimension
pipeline_index = npf.torch.compile(
model,
example_inputs[0],
batch_sizes=1,
pipeline_sizes=pipeline_sizes,
)
index = npf.model_index.append(batch_index, pipeline_index)
npf.model_index.save(index, 'model_index.json')
The compile
function supports batch_sizes
, pipeline_sizes
, cast_modes
, and custom compiler_args
. If there is an error during compilation for a requested configuration, it will be logged and compilation will continue onward without terminating. (This is to support long-running compile jobs with many configurations.)
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn2