This document is relevant for: Inf1, Inf2, Trn1, Trn1n

NeuronPerf Examples#

This page walks through several examples of using NeuronPerf, starting with the simplest way—using a compiled model. We will also see how we can use NeuronPerf to perform a hyperparameter search, and manage the artifacts produced, as well as our results.

Benchmark a Compiled Model#

This example assumes you have already compiled your model for Neuron and saved it to disk. You will need to adapt the batch size, input shape, and filename for your model.

import torch  # or tensorflow, mxnet

import neuronperf as npf
import neuronperf.torch  # or tensorflow, mxnet

# Construct dummy inputs
batch_sizes = 1
input_shape = (batch_sizes, 3, 224, 224)
inputs = torch.ones(input_shape)  # or numpy array for TF, MX

# Benchmark and save results
reports = npf.torch.benchmark("your_model_file.pt", inputs, batch_sizes)
npf.print_reports(reports)
npf.write_json(reports)
INFO:neuronperf.benchmarking - Benchmarking 'your_model_file.pt', ~8.0 minutes remaining.
throughput_avg    latency_ms_p50    latency_ms_p99    n_models          pipeline_size     workers_per_model batch_size        model_filename
296766.5          0.003             0.003             1                 1                 1                 1                 your_model_file.pt
3616109.75        0.005             0.008             24                1                 1                 1                 your_model_file.pt
56801.0           0.035             0.04              1                 1                 2                 1                 your_model_file.pt
3094419.4         0.005             0.051             24                1                 2                 1                 your_model_file.pt

Let’s suppose you only wish to test two specific configurations. You wish to benchmark 1 model and 1 worker thread, and also with 2 worker threads for 15 seconds each. The call to benchmark becomes:

reports = npf.torch.benchmark(filename, inputs, batch_sizes, n_models=1, workers_per_model=[1, 2], duration=15)

You can also add a custom model name to reports.

reports = npf.torch.benchmark(..., model_name="MyFancyModel")

See the NeuronPerf Benchmark Guide for further details.

Benchmark a Model from Source#

In this example, we define, compile, and benchmark a simple (dummy) model using PyTorch.

We’ll assume you already have a PyTorch model compiled for Neuron with the filename model_neuron_b1.pt. Furthermore, let’s assume the model was traced with a batch size of 1, and has an input shape of (3, 224, 224).

 1import torch
 2import torch.neuron
 3
 4import neuronperf as npf
 5import neuronperf.torch
 6
 7
 8# Define a simple model
 9class Model(torch.nn.Module):
10    def forward(self, x):
11        x = x * 3
12        return x + 1
13
14
15# Instantiate
16model = Model()
17model.eval()
18
19# Define some inputs
20batch_sizes = [1]
21inputs = [torch.ones((batch_size, 3, 224, 224)) for batch_size in batch_sizes]
22
23# Compile for Neuron
24model_neuron = torch.neuron.trace(model, inputs)
25model_neuron.save("model_neuron_b1.pt")
26
27# Benchmark
28reports = npf.torch.benchmark("model_neuron_b1.pt", inputs, batch_sizes)
29
30# View and save results
31npf.print_reports(reports)
32npf.write_csv(reports, "model_neuron_b1.csv")
(aws_neuron_pytorch_p36) ubuntu@ip-172-31-11-122:~/tmp$ python test_simple_pt.py
INFO:neuronperf.benchmarking - Benchmarking 'model_neuron_b1.pt', ~8.0 minutes remaining.
throughput_avg    latency_ms_p50    latency_ms_p99    n_models          pipeline_size     workers_per_model batch_size        model_filename
296766.5          0.003             0.003             1                 1                 1                 1                 model_neuron_b1.pt
3616109.75        0.005             0.008             24                1                 1                 1                 model_neuron_b1.pt
56801.0           0.035             0.04              1                 1                 2                 1                 model_neuron_b1.pt
3094419.4         0.005             0.051             24                1                 2                 1                 model_neuron_b1.pt

Great! Here is what a default csv file looks like.

n_models

workers_per_model

pipeline_size

batch_size

throughput_avg

throughput_peak

latency_ms_p0

latency_ms_p50

latency_ms_p90

latency_ms_p95

latency_ms_p99

latency_ms_p100

load_avg_ms

warmup_avg_ms

e2e_avg_ms

input_avg_ms

preprocess_avg_ms

postprocess_avg_ms

infer_avg_ms

worker_avg_s

total_infs

total_s

status

model_filename

multiprocess

multiinterpreter

device_type

instance_type

1

1

1

1

31346.0

31408.0

0.03

0.03

0.031

0.032

0.037

0.732

62.217

2.625

0.031

0.001

0.0

0.0

0.028

4.93

154704

5.0

finished

model_neuron_b1.pt

True

False

neuron

inf1.6xlarge

16

16

1

1

380604.75

380923.0

0.03

0.032

0.054

0.054

0.057

0.938

293.806

3.266

0.043

0.001

0.0

0.0

0.039

4.7

1799549

5.0

finished

model_neuron_b1.pt

True

False

neuron

inf1.6xlarge

1

2

1

1

51178.0

51319.0

0.035

0.036

0.037

0.039

0.047

1.13

114.118

2.713

0.037

0.001

0.0

0.0

0.033

4.88

248984

5.0

finished

model_neuron_b1.pt

True

False

neuron

inf1.6xlarge

16

32

1

1

381098.75

383905.0

0.03

0.058

0.067

0.073

0.121

48.07

303.916

4.42

0.08

0.001

0.0

0.0

0.074

4.69

1804925

5.0

finished

model_neuron_b1.pt

True

False

neuron

inf1.6xlarge

Compile and Benchmark a Model#

Here is an end-to-end example of compiling and benchmarking a ResNet-50 model from torchvision.

 1import torch
 2import torch_neuron
 3
 4import neuronperf as npf
 5import neuronperf.torch
 6
 7from torchvision import models
 8
 9
10# Load a pretrained ResNet50 model
11model = models.resnet50(pretrained=True)
12
13# Select a few batch sizes to test
14filename = 'resnet50.json'
15batch_sizes = [5, 6, 7]
16
17# Construct example inputs
18inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) for batch_size in batch_sizes]
19
20# Compile
21npf.torch.compile(
22	model, 
23	inputs, 
24	batch_sizes=batch_sizes, 
25	filename=filename,
26)
27
28# Benchmark
29reports = npf.torch.benchmark(filename, inputs)
30
31# View and save results
32npf.print_reports(reports)
33npf.write_csv(reports, 'resnet50_results.csv')
34npf.write_json(reports, 'resnet50_results.json')

Benchmark on CPU or GPU#

When benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated.

Note

GPU benchmarking is currently only available for PyTorch.

CPU:

cpu_reports = npf.cpu.benchmark(YourModelClass, ...)

GPU:

gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type="gpu")

Please refer to Benchmark on CPU or GPU for details and an example of providing your model class.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n