This document is relevant for: Inf1, Inf2, Trn1, Trn2

NeuronPerf FAQ#

When should I use NeuronPerf?#

When you want to measure the highest achievable performance for your model with Neuron.

When should I not use NeuronPerf?#

When measuring end-to-end performance that includes your network serving stack. Instead, your should compare your e2e numbers to those obtained by NeuronPerf to optimize your serving overhead.

Which frameworks does NeuronPerf support?#

See NeuronPerf Framework Notes.

Which Neuron instance types does NeuronPerf support?#

PyTorch and TensorFlow support all instance types. MXNet support is limited to inf1.

Is NeuronPerf Open Source?#

Yes. You can download the source here.

What is the secret to obtaining the best numbers?#

There is no secret sauce. NeuronPerf follows best practices.

What are the “best practices” that NeuronPerf uses?#

These vary slightly by framework and how your model was compiled
For a model compiled for a single NeuronCore (DataParallel):
- To maximize throughput, for N models, use 2 * N worker threads
- To minimize latency, use 1 worker thread per model
Use a new Python process for each model to avoid GIL contention
Ensure you benchmark long enough for your numbers to stabilize
Ignore outliers at the start and end of inference benchmarking