This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
NeuronPerf Examples#
This page walks through several examples of using NeuronPerf, starting with the simplest way—using a compiled model. We will also see how we can use NeuronPerf to perform a hyperparameter search, and manage the artifacts produced, as well as our results.
Benchmark a Compiled Model#
This example assumes you have already compiled your model for Neuron and saved it to disk. You will need to adapt the batch size, input shape, and filename for your model.
import torch # or tensorflow, mxnet
import neuronperf as npf
import neuronperf.torch # or tensorflow, mxnet
# Construct dummy inputs
batch_sizes = 1
input_shape = (batch_sizes, 3, 224, 224)
inputs = torch.ones(input_shape) # or numpy array for TF, MX
# Benchmark and save results
reports = npf.torch.benchmark("your_model_file.pt", inputs, batch_sizes)
npf.print_reports(reports)
npf.write_json(reports)
INFO:neuronperf.benchmarking - Benchmarking 'your_model_file.pt', ~8.0 minutes remaining.
throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename
296766.5 0.003 0.003 1 1 1 1 your_model_file.pt
3616109.75 0.005 0.008 24 1 1 1 your_model_file.pt
56801.0 0.035 0.04 1 1 2 1 your_model_file.pt
3094419.4 0.005 0.051 24 1 2 1 your_model_file.pt
Let’s suppose you only wish to test two specific configurations. You wish to benchmark 1 model and 1 worker thread, and also with 2 worker threads for 15 seconds each. The call to benchmark
becomes:
reports = npf.torch.benchmark(filename, inputs, batch_sizes, n_models=1, workers_per_model=[1, 2], duration=15)
You can also add a custom model name to reports.
reports = npf.torch.benchmark(..., model_name="MyFancyModel")
See the NeuronPerf Benchmark Guide for further details.
Benchmark a Model from Source#
In this example, we define, compile, and benchmark a simple (dummy) model using PyTorch.
We’ll assume you already have a PyTorch model compiled for Neuron with the filename model_neuron_b1.pt
. Furthermore, let’s assume the model was traced with a batch size of 1, and has an input shape of (3, 224, 224).
1import torch
2import torch.neuron
3
4import neuronperf as npf
5import neuronperf.torch
6
7
8# Define a simple model
9class Model(torch.nn.Module):
10 def forward(self, x):
11 x = x * 3
12 return x + 1
13
14
15# Instantiate
16model = Model()
17model.eval()
18
19# Define some inputs
20batch_sizes = [1]
21inputs = [torch.ones((batch_size, 3, 224, 224)) for batch_size in batch_sizes]
22
23# Compile for Neuron
24model_neuron = torch.neuron.trace(model, inputs)
25model_neuron.save("model_neuron_b1.pt")
26
27# Benchmark
28reports = npf.torch.benchmark("model_neuron_b1.pt", inputs, batch_sizes)
29
30# View and save results
31npf.print_reports(reports)
32npf.write_csv(reports, "model_neuron_b1.csv")
(aws_neuron_pytorch_p36) ubuntu@ip-172-31-11-122:~/tmp$ python test_simple_pt.py
INFO:neuronperf.benchmarking - Benchmarking 'model_neuron_b1.pt', ~8.0 minutes remaining.
throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename
296766.5 0.003 0.003 1 1 1 1 model_neuron_b1.pt
3616109.75 0.005 0.008 24 1 1 1 model_neuron_b1.pt
56801.0 0.035 0.04 1 1 2 1 model_neuron_b1.pt
3094419.4 0.005 0.051 24 1 2 1 model_neuron_b1.pt
Great! Here is what a default csv
file looks like.
n_models |
workers_per_model |
pipeline_size |
batch_size |
throughput_avg |
throughput_peak |
latency_ms_p0 |
latency_ms_p50 |
latency_ms_p90 |
latency_ms_p95 |
latency_ms_p99 |
latency_ms_p100 |
load_avg_ms |
warmup_avg_ms |
e2e_avg_ms |
input_avg_ms |
preprocess_avg_ms |
postprocess_avg_ms |
infer_avg_ms |
worker_avg_s |
total_infs |
total_s |
status |
model_filename |
multiprocess |
multiinterpreter |
device_type |
instance_type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
1 |
1 |
1 |
31346.0 |
31408.0 |
0.03 |
0.03 |
0.031 |
0.032 |
0.037 |
0.732 |
62.217 |
2.625 |
0.031 |
0.001 |
0.0 |
0.0 |
0.028 |
4.93 |
154704 |
5.0 |
finished |
model_neuron_b1.pt |
True |
False |
neuron |
inf1.6xlarge |
16 |
16 |
1 |
1 |
380604.75 |
380923.0 |
0.03 |
0.032 |
0.054 |
0.054 |
0.057 |
0.938 |
293.806 |
3.266 |
0.043 |
0.001 |
0.0 |
0.0 |
0.039 |
4.7 |
1799549 |
5.0 |
finished |
model_neuron_b1.pt |
True |
False |
neuron |
inf1.6xlarge |
1 |
2 |
1 |
1 |
51178.0 |
51319.0 |
0.035 |
0.036 |
0.037 |
0.039 |
0.047 |
1.13 |
114.118 |
2.713 |
0.037 |
0.001 |
0.0 |
0.0 |
0.033 |
4.88 |
248984 |
5.0 |
finished |
model_neuron_b1.pt |
True |
False |
neuron |
inf1.6xlarge |
16 |
32 |
1 |
1 |
381098.75 |
383905.0 |
0.03 |
0.058 |
0.067 |
0.073 |
0.121 |
48.07 |
303.916 |
4.42 |
0.08 |
0.001 |
0.0 |
0.0 |
0.074 |
4.69 |
1804925 |
5.0 |
finished |
model_neuron_b1.pt |
True |
False |
neuron |
inf1.6xlarge |
Compile and Benchmark a Model#
Here is an end-to-end example of compiling and benchmarking a ResNet-50 model from torchvision
.
1import torch
2import torch_neuron
3
4import neuronperf as npf
5import neuronperf.torch
6
7from torchvision import models
8
9
10# Load a pretrained ResNet50 model
11model = models.resnet50(pretrained=True)
12
13# Select a few batch sizes to test
14filename = 'resnet50.json'
15batch_sizes = [5, 6, 7]
16
17# Construct example inputs
18inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) for batch_size in batch_sizes]
19
20# Compile
21npf.torch.compile(
22 model,
23 inputs,
24 batch_sizes=batch_sizes,
25 filename=filename,
26)
27
28# Benchmark
29reports = npf.torch.benchmark(filename, inputs)
30
31# View and save results
32npf.print_reports(reports)
33npf.write_csv(reports, 'resnet50_results.csv')
34npf.write_json(reports, 'resnet50_results.json')
Benchmark on CPU or GPU#
When benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated.
Note
GPU benchmarking is currently only available for PyTorch.
CPU:
cpu_reports = npf.cpu.benchmark(YourModelClass, ...)
GPU:
gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type="gpu")
Please refer to Benchmark on CPU or GPU for details and an example of providing your model class.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n