This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n
NeuronPerf API#
Due to a bug in Sphinx, some of the type annotations may be incomplete. You can download the source code here
. In the future, the source will be hosted in a more browsable way.
- compile(compile_fn, model, inputs, batch_sizes: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[str, List[int]] = None, models_dir: str = 'models', filename: str = None, compiler_args: dict = None, verbosity: int = 1, *args, **kwargs) str: #
Compiles the provided model with each provided example input, pipeline size, and performance level. Any additional compiler_args passed will be forwarded to the compiler on every invocation.
- Parameters:
model – The model to compile.
inputs (list) – A list of example inputs.
batch_sizes – A list of batch sizes that correspond to the example inputs.
pipeline_sizes – A list of pipeline sizes to use. See NeuronCore Pipeline.
performance_levels – A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See Mixed precision and performance-accuracy tuning (neuron-cc).
models_dir (str) – The directory where compilation artifacts will be stored.
model_name (str) – An optional model name tag to apply to compiled artifacts.
filename (str) – The name of the model index to write out. If not provided, a name will be generated and returned.
compiler_args (dict) – Additional compiler arguments to be forwarded with every compilation.
verbosity (int) – 0 = error, 1 = info, 2 = debug
- Returns:
A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged.
- Return type:
- benchmark(load_fn: Callable[[str, int], Any], model_filename: str, inputs: Any, batch_sizes: Union[int, List[int]] = None, duration: float = BENCHMARK_SECS, n_models: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, cast_modes: Union[str, List[str]] = None, workers_per_model: Union[int, None] = None, env_setup_fn: Callable[[int, Dict], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, verbosity: int = 1, multiprocess: bool = True, multiinterpreter: bool = False, return_timers: bool = False, device_type: str = 'neuron') List[Dict]: #
Benchmarks the model index or individiual model using the provided inputs. If a model index is provided, additional fields such as
pipeline_sizes
andperformance_levels
can be used to filter the models to benchmark. The default behavior is to benchmark all configurations in the model index.- Parameters:
load_fn – A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g.
neuronperf.torch.benchmark
).model_filename (str) – A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g.
MyModelClass
).inputs (list) – A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments.
batch_sizes – A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided.
duration (float) – The number of seconds to benchmark each model.
n_models – The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from
device_type
, instance size, or other environment state.pipeline_sizes – A list of pipeline sizes to use. See NeuronCore Pipeline.
performance_levels – A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See Mixed precision and performance-accuracy tuning (neuron-cc).
workers_per_model – The number of workers to use per model loaded. If
None
, this is automatically selected.env_setup_fn – A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config.
setup_fn – A function that receives the benchmarker id, config, and model to perform last minute configuration before inference.
preprocess_fn – A custom preprocessing function to perform on each input before inference.
postprocess_fn – A custom postprocessing function to perform on each input after inference.
multiprocess (bool) – When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging.
multiinterpreter (bool) – When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess.
return_timers (bool) – When True, the return of this function is a list of tuples
(config, results)
with detailed information. This can be converted to reports withget_reports(results)
.stats_interval (float) – Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage.
device_type (str) – This will be set automatically to one of the
SUPPORTED_DEVICE_TYPES
.cost_per_hour (float) – The price of this device / hour. Used to estimate cost / 1 million infs in reports.
model_name (str) – A friendly name for the model to use in reports.
model_class_name (str) – Internal use.
model_class_file (str) – Internal use.
verbosity (int) – 0 = error, 1 = info, 2 = debug
- Returns:
A list of benchmarking results.
- Return type:
- get_reports(results)#
Summarizes and combines the detailed results from
neuronperf.benchmark
, when run withreturn_timers=True
. One report dictionary is produced per model configuration benchmarked. The list of reports can be fed directly to other reporting utilities, such asneuronperf.write_csv
.- Parameters:
- Returns:
A list of dictionaries that summarize the results for each model configuration.
- Return type:
- print_reports(reports, cols=SUMMARY_COLS, sort_by='throughput_peak', reverse=False)#
Print a report to the terminal. Example of default behavior:
>>> neuronperf.print_reports(reports) throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 329.667 6.073 6.109 1 1 2 1 models/model_b1_p1_83bh3hhs.pt
- Parameters:
reports – Results from get_reports.
cols – The columns in the report to be displayed.
sort_by – Sort the cols by the specified key.
reverse – Sort order.
- write_csv(reports: list[dict], filename: str = None, cols=REPORT_COLS)#
Write benchmarking reports to CSV file.
- write_json(reports: list[dict], filename: str = None)#
Writes benchmarking reports to a JSON file.
- param list[dict] reports:
Results from neuronperf.get_reports.
- param str filename:
Filename to write. If not provided, generated from model_name in report and current timestamp.
- return:
The filename written.
- rtype:
str
- model_index.append(*model_indexes: Union[str, dict]) dict: #
Appends the model indexes non-destructively into a new model index, without modifying any of the internal data.
This is useful if you have benchmarked multiple related models and wish to combine their respective model indexes into a single index.
Model name will be taken from the first index provided. Duplicate configs will be filtered.
- Parameters:
model_indexes – Model indexes or paths to model indexes to combine.
- Returns:
A new dictionary representing the combined model index.
- Return type:
- model_index.copy(old_index: Union[str, dict], new_index: str, new_dir: str) str: #
Copy an index to a new location. Will rename
old_index
tonew_index
and copy all model files intonew_dir
, updating the index paths.This is useful for pulling individual models out of a pool.
Returns the path to the new index.
- model_index.create(filename, input_idx=0, batch_size=1, pipeline_size=1, cast_mode=DEFAULT_CAST, compile_s=None)#
Create a new model index from a pre-compiled model.
- Parameters:
filename (str) – The path to the compiled model.
input_idx (int) – The index in your inputs that this model should be run on.
batch_size (int) – The batch size at compilation for this model.
pipeline_size (int) – The pipeline size used at compilation for this model.
cast_mode (str) – The casting option this model was compiled with.
compile_s (float) – Seconds spent compiling.
- Returns:
A new dictionary representing a model index.
- Return type:
- model_index.delete(filename: str):
Deletes the model index and all associated models referenced by the index.
- model_index.filter(index: Union[str, dict], **kwargs) dict: #
Filters provided model index on provided criteria and returns a new index. Each kwarg is a standard (k, v) pair, where k is treated as a filter name and v may be one or more values used to filter model configs.
- model_index.load(filename) dict: #
Load a NeuronPerf model index from a file.
- model_index.move(old_index: str, new_index: str, new_dir: str) str: #
This is the same as
copy
followed bydelete
on the old index.
- model_index.save(model_index, filename: str = None, root_dir=None) str: #
Save a NeuronPerf model index to a file.
This document is relevant for: Inf1
, Inf2
, Trn1
, Trn1n