NeuronPerf FAQ#
When should I use NeuronPerf?#
When you want to measure the highest achievable performance for your model with Neuron.
When should I not use NeuronPerf?#
When measuring end-to-end performance that includes your network serving stack. Instead, your should compare your e2e numbers to those obtained by NeuronPerf to optimize your serving overhead.
Which frameworks does NeuronPerf support?#
Which Neuron instance types does NeuronPerf support?#
PyTorch and TensorFlow support all instance types. MXNet support is limited to inf1.
Is NeuronPerf Open Source?#
Yes. You can download the source here.
What is the secret to obtaining the best numbers?#
There is no secret sauce. NeuronPerf follows best practices.
What are the “best practices” that NeuronPerf uses?#
These vary slightly by framework and how your model was compiled
For a model compiled for a single NeuronCore (DataParallel):
To maximize throughput, for
Nmodels, use2 * Nworker threadsTo minimize latency, use 1 worker thread per model
Use a new Python process for each model to avoid GIL contention
Ensure you benchmark long enough for your numbers to stabilize
Ignore outliers at the start and end of inference benchmarking