This document is relevant for: Inf1, Inf2, Trn1, Trn1n

NeuronPerf Evaluate Guide#

NeuronPerf has a new API for evaluating model accuracy on Neuron hardware. This API is currently only available for PyTorch.

You can access the API through standard benchmark() by passing an additional kwarg, eval_metrics.

For example:

reports = npf.torch.benchmark(
    model_index_or_path,
    dataset,
    n_models=1,
    workers_per_model=2,
    duration=0,
    eval_metrics=['accuracy', 'precision']
)

In this example, we fix n_models and n_workers because replicating the same model will not impact accuracy. We also set duration=0 to allow benchmarking to run untimed through all dataset examples.

Because this call can be tedious to type, a convenience function is provided:

reports = npf.torch.evaluate(model_index_or_path, dataset, metrics=['accuracy', 'precision'])

The dataset can be any iterable object that produces tuple(*INPUTS, TARGET).

If TARGET does not appear in the last column for your dataset, you can customize this by passing eval_target_col.

For example:

reports = npf.torch.evaluate(model_index_or_path, dataset, metrics='accuracy', eval_target_col=1)

You can list the currently available metrics.

>>> npf.list_metrics()                                                                                 │·····
Name                     Description                                                                   │·····
Accuracy                 (TP + TN) / (TP + TN + FP + FN)                                               │·····
TruePositiveRate         TP / (TP + FN)                                                                │·····
Sensitivity              Alias for TruePositiveRate                                                    │·····
Recall                   Alias for TruePositiveRate                                                    │·····
Hit Rate                 Alias for TruePositiveRate                                                    │·····
TrueNegativeRate         TN / (TN + FP)                                                                │·····
Specificity              Alias for TrueNegativeRate                                                    │·····
Selectivity              Alias for TrueNegativeRate                                                    │·····
PositivePredictiveValue  TP / (TP + FP)                                                                │·····
Precision                Alias for PositivePredictiveValue                                             │·····
NegativePredictiveValue  TN / (TN + FN)                                                                │·····
FalseNegativeRate        FN / (FN + TP)                                                                │·····
FalsePositiveRate        FP / (FP + TN)                                                                │·····
FalseDiscoveryRate       FP / (FP + TN)                                                                │·····
FalseOmissionRate        FP / (FP + TP)                                                                │·····
PositiveLikelihoodRatio  TPR / FPR                                                                     │·····
NegativeLikelihoodRatio  FNR / TNR                                                                     │·····
PrevalenceThreshold      sqrt(FPR) / (sqrt(FPR) + sqrt(TPR))                                           │·····
ThreatScore              TP / (TP + FN + FP)                                                           │·····
F1Score                  2TP / (2TP + FN + FP)                                                         │·····
MeanAbsoluteError        sum(|y - x|) / n                                                              │·····
MeanSquaredError         sum((y - x)^2) / n

New metrics may appear in the list after importing a submodule. For example, import neuronperf.torch will register a new topk metric.

Custom Metrics#

Simple Variants#

If you wish to register a metric that is a slight tweak of an existing metric with different init args, you can use register_metric_from_existing():

npf.register_metric_from_existing("topk", "topk_3", k=3)

This example registers a new metric topk_3 from existing metric topk, passing k=3 as at init time.

New Metrics#

You can register your own metrics using register_metric().

You metrics must extend BaseEvalMetric:

class BaseEvalMetric(ABC):
    """
    Abstract base class BaseEvalMetric from which other metrics inherit.
    """

    @abstractmethod
    def process_record(self, output: Any = None, target: Any = None) -> None:
        """Process an individual record and return the result."""
        pass

    @staticmethod
    def aggregate(metrics: Iterable["BaseEvalMetric"]) -> Any:
        """Combine a sequence of metrics into a single result."""
        raise NotImplementedError

For example:

import neuronperf as npf

class MyCustomMetric(npf.BaseEvalMetric):
    def __init__(self):
        super().__init__()
        self.passing = 0
        self.processed = 0

    def process_record(self, outputs, target):
        self.processed += 1
        if outputs == target:
            self.passing += 1

    @staticmethod
    def aggregate(metrics):
        passing = 0
        processed = 0
        for metric in metrics:
            passing += metric.passing
            processed += metric.processed
        return passing / processed if processed else 0


npf.register_metric("MyCustomMetric", MyCustomMetric)

This document is relevant for: Inf1, Inf2, Trn1, Trn1n