This document is relevant for: Inf1, Inf2, Trn1, Trn1n

NeuronPerf Examples#

This page walks through several examples of using NeuronPerf, starting with the simplest way—using a compiled model. We will also see how we can use NeuronPerf to perform a hyperparameter search, and manage the artifacts produced, as well as our results.

Benchmark a Compiled Model#

This example assumes you have already compiled your model for Neuron and saved it to disk. You will need to adapt the batch size, input shape, and filename for your model.

import torch  # or tensorflow, mxnet

import neuronperf as npf
import neuronperf.torch  # or tensorflow, mxnet

# Construct dummy inputs
batch_sizes = 1
input_shape = (batch_sizes, 3, 224, 224)
inputs = torch.ones(input_shape)  # or numpy array for TF, MX

# Benchmark and save results
reports = npf.torch.benchmark("your_model_file.pt", inputs, batch_sizes)
npf.print_reports(reports)
npf.write_json(reports)

INFO:neuronperf.benchmarking - Benchmarking 'your_model_file.pt', ~8.0 minutes remaining.
throughput_avg    latency_ms_p50    latency_ms_p99    n_models          pipeline_size     workers_per_model batch_size        model_filename
296766.5          0.003             0.003             1                 1                 1                 1                 your_model_file.pt
3616109.75        0.005             0.008             24                1                 1                 1                 your_model_file.pt
56801.0           0.035             0.04              1                 1                 2                 1                 your_model_file.pt
3094419.4         0.005             0.051             24                1                 2                 1                 your_model_file.pt

Let’s suppose you only wish to test two specific configurations. You wish to benchmark 1 model and 1 worker thread, and also with 2 worker threads for 15 seconds each. The call to benchmark becomes:

reports = npf.torch.benchmark(filename, inputs, batch_sizes, n_models=1, workers_per_model=[1, 2], duration=15)

You can also add a custom model name to reports.

reports = npf.torch.benchmark(..., model_name="MyFancyModel")

See the NeuronPerf Benchmark Guide for further details.

Benchmark a Model from Source#

In this example, we define, compile, and benchmark a simple (dummy) model using PyTorch.

We’ll assume you already have a PyTorch model compiled for Neuron with the filename model_neuron_b1.pt. Furthermore, let’s assume the model was traced with a batch size of 1, and has an input shape of (3, 224, 224).

test_simple_pt.py#

import torch
import torch.neuron

import neuronperf as npf
import neuronperf.torch


# Define a simple model
class Model(torch.nn.Module):
    def forward(self, x):
        x = x * 3
        return x + 1


# Instantiate
model = Model()
model.eval()

# Define some inputs
batch_sizes = [1]
inputs = [torch.ones((batch_size, 3, 224, 224)) for batch_size in batch_sizes]

# Compile for Neuron
model_neuron = torch.neuron.trace(model, inputs)
model_neuron.save("model_neuron_b1.pt")

# Benchmark
reports = npf.torch.benchmark("model_neuron_b1.pt", inputs, batch_sizes)

# View and save results
npf.print_reports(reports)
npf.write_csv(reports, "model_neuron_b1.csv")

(aws_neuron_pytorch_p36) ubuntu@ip-172-31-11-122:~/tmp$ python test_simple_pt.py
INFO:neuronperf.benchmarking - Benchmarking 'model_neuron_b1.pt', ~8.0 minutes remaining.
throughput_avg    latency_ms_p50    latency_ms_p99    n_models          pipeline_size     workers_per_model batch_size        model_filename
296766.5          0.003             0.003             1                 1                 1                 1                 model_neuron_b1.pt
3616109.75        0.005             0.008             24                1                 1                 1                 model_neuron_b1.pt
56801.0           0.035             0.04              1                 1                 2                 1                 model_neuron_b1.pt
3094419.4         0.005             0.051             24                1                 2                 1                 model_neuron_b1.pt

Great! Here is what a default csv file looks like.

n_models	workers_per_model	pipeline_size	batch_size	throughput_avg	throughput_peak	latency_ms_p0	latency_ms_p50	latency_ms_p90	latency_ms_p95	latency_ms_p99	latency_ms_p100	load_avg_ms	warmup_avg_ms	e2e_avg_ms	input_avg_ms	infer_avg_ms	worker_avg_s	total_infs	total_s	status	model_filename	multiprocess	multiinterpreter	device_type	instance_type
1	1	1	1	31346.0	31408.0	0.03	0.03	0.031	0.032	0.037	0.732	62.217	2.625	0.031	0.001	0.028	4.93	154704	5.0	finished	model_neuron_b1.pt	True	False	neuron	inf1.6xlarge
16	16	1	1	380604.75	380923.0	0.03	0.032	0.054	0.054	0.057	0.938	293.806	3.266	0.043	0.001	0.039	4.7	1799549	5.0	finished	model_neuron_b1.pt	True	False	neuron	inf1.6xlarge
1	2	1	1	51178.0	51319.0	0.035	0.036	0.037	0.039	0.047	1.13	114.118	2.713	0.037	0.001	0.033	4.88	248984	5.0	finished	model_neuron_b1.pt	True	False	neuron	inf1.6xlarge
16	32	1	1	381098.75	383905.0	0.03	0.058	0.067	0.073	0.121	48.07	303.916	4.42	0.08	0.001	0.074	4.69	1804925	5.0	finished	model_neuron_b1.pt	True	False	neuron	inf1.6xlarge

Compile and Benchmark a Model#

Here is an end-to-end example of compiling and benchmarking a ResNet-50 model from torchvision.

test_resnet50_pt.py#

import torch
import torch_neuron

import neuronperf as npf
import neuronperf.torch

from torchvision import models


# Load a pretrained ResNet50 model
model = models.resnet50(pretrained=True)

# Select a few batch sizes to test
filename = 'resnet50.json'
batch_sizes = [5, 6, 7]

# Construct example inputs
inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) for batch_size in batch_sizes]

# Compile
npf.torch.compile(
	model, 
	inputs, 
	batch_sizes=batch_sizes, 
	filename=filename,
)

# Benchmark
reports = npf.torch.benchmark(filename, inputs)

# View and save results
npf.print_reports(reports)
npf.write_csv(reports, 'resnet50_results.csv')
npf.write_json(reports, 'resnet50_results.json')

Benchmark on CPU or GPU#

When benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated.

Note

GPU benchmarking is currently only available for PyTorch.

CPU:

cpu_reports = npf.cpu.benchmark(YourModelClass, ...)

GPU:

gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type="gpu")

Please refer to Benchmark on CPU or GPU for details and an example of providing your model class.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n

AWS Neuron Documentation

NeuronPerf Examples

Contents

NeuronPerf Examples#

Benchmark a Compiled Model#

Benchmark a Model from Source#

Compile and Benchmark a Model#

Benchmark on CPU or GPU#