This document is relevant for: Inf1, Inf2, Trn1, Trn1n

NeuronPerf API#

Due to a bug in Sphinx, some of the type annotations may be incomplete. You can download the source code here. In the future, the source will be hosted in a more browsable way.

compile(compile_fn, model, inputs, batch_sizes: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[str, List[int]] = None, models_dir: str = 'models', filename: str = None, compiler_args: dict = None, verbosity: int = 1, *args, **kwargs) str:#

Compiles the provided model with each provided example input, pipeline size, and performance level. Any additional compiler_args passed will be forwarded to the compiler on every invocation.

Parameters:
  • model – The model to compile.

  • inputs (list) – A list of example inputs.

  • batch_sizes – A list of batch sizes that correspond to the example inputs.

  • pipeline_sizes – A list of pipeline sizes to use. See NeuronCore Pipeline.

  • performance_levels – A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See Mixed precision and performance-accuracy tuning (neuron-cc).

  • models_dir (str) – The directory where compilation artifacts will be stored.

  • model_name (str) – An optional model name tag to apply to compiled artifacts.

  • filename (str) – The name of the model index to write out. If not provided, a name will be generated and returned.

  • compiler_args (dict) – Additional compiler arguments to be forwarded with every compilation.

  • verbosity (int) – 0 = error, 1 = info, 2 = debug

Returns:

A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged.

Return type:

str

benchmark(load_fn: Callable[[str, int], Any], model_filename: str, inputs: Any, batch_sizes: Union[int, List[int]] = None, duration: float = BENCHMARK_SECS, n_models: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, cast_modes: Union[str, List[str]] = None, workers_per_model: Union[int, None] = None, env_setup_fn: Callable[[int, Dict], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, verbosity: int = 1, multiprocess: bool = True, multiinterpreter: bool = False, return_timers: bool = False, device_type: str = 'neuron') List[Dict]:#

Benchmarks the model index or individiual model using the provided inputs. If a model index is provided, additional fields such as pipeline_sizes and performance_levels can be used to filter the models to benchmark. The default behavior is to benchmark all configurations in the model index.

Parameters:
  • load_fn – A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g. neuronperf.torch.benchmark).

  • model_filename (str) – A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g. MyModelClass).

  • inputs (list) – A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments.

  • batch_sizes – A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided.

  • duration (float) – The number of seconds to benchmark each model.

  • n_models – The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from device_type, instance size, or other environment state.

  • pipeline_sizes – A list of pipeline sizes to use. See NeuronCore Pipeline.

  • performance_levels – A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See Mixed precision and performance-accuracy tuning (neuron-cc).

  • workers_per_model – The number of workers to use per model loaded. If None, this is automatically selected.

  • env_setup_fn – A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config.

  • setup_fn – A function that receives the benchmarker id, config, and model to perform last minute configuration before inference.

  • preprocess_fn – A custom preprocessing function to perform on each input before inference.

  • postprocess_fn – A custom postprocessing function to perform on each input after inference.

  • multiprocess (bool) – When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging.

  • multiinterpreter (bool) – When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess.

  • return_timers (bool) – When True, the return of this function is a list of tuples (config, results) with detailed information. This can be converted to reports with get_reports(results).

  • stats_interval (float) – Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage.

  • device_type (str) – This will be set automatically to one of the SUPPORTED_DEVICE_TYPES.

  • cost_per_hour (float) – The price of this device / hour. Used to estimate cost / 1 million infs in reports.

  • model_name (str) – A friendly name for the model to use in reports.

  • model_class_name (str) – Internal use.

  • model_class_file (str) – Internal use.

  • verbosity (int) – 0 = error, 1 = info, 2 = debug

Returns:

A list of benchmarking results.

Return type:

list[dict]

get_reports(results)#

Summarizes and combines the detailed results from neuronperf.benchmark, when run with return_timers=True. One report dictionary is produced per model configuration benchmarked. The list of reports can be fed directly to other reporting utilities, such as neuronperf.write_csv.

Parameters:
  • results (list[tuple]) – The list of results from neuronperf.benchmark.

  • batch_sizes (list[int]) – The batch sizes that correspond to the inputs provided to compile and benchmark. Used to correct throughput values in the reports.

Returns:

A list of dictionaries that summarize the results for each model configuration.

Return type:

list[dict]

Print a report to the terminal. Example of default behavior:

>>> neuronperf.print_reports(reports)
throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size  workers_per_model batch_size model_filename
329.667        6.073          6.109          1        1              2                 1          models/model_b1_p1_83bh3hhs.pt
Parameters:
  • reports – Results from get_reports.

  • cols – The columns in the report to be displayed.

  • sort_by – Sort the cols by the specified key.

  • reverse – Sort order.

write_csv(reports: list[dict], filename: str = None, cols=REPORT_COLS)#

Write benchmarking reports to CSV file.

Parameters:
  • reports (list[dict]) – Results from neuronperf.get_reports.

  • filename (str) – Filename to write. If not provided, generated from model_name in report and current timestamp.

  • cols (list[str]) – The columns in the report to be kept.

Returns:

The filename written.

Return type:

str

write_json(reports: list[dict], filename: str = None)#

Writes benchmarking reports to a JSON file.

param list[dict] reports:

Results from neuronperf.get_reports.

param str filename:

Filename to write. If not provided, generated from model_name in report and current timestamp.

return:

The filename written.

rtype:

str

model_index.append(*model_indexes: Union[str, dict]) dict:#

Appends the model indexes non-destructively into a new model index, without modifying any of the internal data.

This is useful if you have benchmarked multiple related models and wish to combine their respective model indexes into a single index.

Model name will be taken from the first index provided. Duplicate configs will be filtered.

Parameters:

model_indexes – Model indexes or paths to model indexes to combine.

Returns:

A new dictionary representing the combined model index.

Return type:

dict

model_index.copy(old_index: Union[str, dict], new_index: str, new_dir: str) str:#

Copy an index to a new location. Will rename old_index to new_index and copy all model files into new_dir, updating the index paths.

This is useful for pulling individual models out of a pool.

Returns the path to the new index.

model_index.create(filename, input_idx=0, batch_size=1, pipeline_size=1, cast_mode=DEFAULT_CAST, compile_s=None)#

Create a new model index from a pre-compiled model.

Parameters:
  • filename (str) – The path to the compiled model.

  • input_idx (int) – The index in your inputs that this model should be run on.

  • batch_size (int) – The batch size at compilation for this model.

  • pipeline_size (int) – The pipeline size used at compilation for this model.

  • cast_mode (str) – The casting option this model was compiled with.

  • compile_s (float) – Seconds spent compiling.

Returns:

A new dictionary representing a model index.

Return type:

dict

model_index.delete(filename: str):

Deletes the model index and all associated models referenced by the index.

model_index.filter(index: Union[str, dict], **kwargs) dict:#

Filters provided model index on provided criteria and returns a new index. Each kwarg is a standard (k, v) pair, where k is treated as a filter name and v may be one or more values used to filter model configs.

model_index.load(filename) dict:#

Load a NeuronPerf model index from a file.

model_index.move(old_index: str, new_index: str, new_dir: str) str:#

This is the same as copy followed by delete on the old index.

model_index.save(model_index, filename: str = None, root_dir=None) str:#

Save a NeuronPerf model index to a file.

This document is relevant for: Inf1, Inf2, Trn1, Trn1n