Model performance metrics

Difference in Positive Proportions in Predicted Labels (DPPL)

DPPL (Difference in Positive Proportions in Predicted Labels) is a metric used to evaluate the performance of machine learning models in imbalanced datasets. It measures the difference in the proportion of positive predictions made by the model between the minority class and the majority class.

A low value of DPPL indicates that the model is able to make similar positive predictions for instances of the minority class and instances of the majority class, which is desirable in imbalanced datasets where it is important to ensure that the minority class is not overlooked.

It is a classification-specific metric, as it is only applicable to models that perform binary or multi-class classification. It assesses the model’s ability to make positive predictions for instances of the minority class similar to the instances of the majority class, which is important in imbalanced datasets where the minority class is often under-represented.

DPPL is also a threshold-independent metric, which means that it does not depend on the specific threshold used to make binary predictions. This makes it useful in situations where the threshold used to make predictions may need to be adjusted depending on the specific use case.

The formula for the DPPL metric is as follows:

$$ DPPL = \vert p_{m} - p_M \vert $$

where:

  • $p_m$ is the proportion of positive predictions made by the model for instances of the minority class

  • $p_M$ is the proportion of positive predictions made by the model for instances of the majority class

    It calculates the absolute difference between the proportion of positive predictions made by the model for the minority class and the majority class.

    import pandas as pd
    import numpy as np
    # assume data is a pandas dataframe with the columns 'label' and 'prediction'
    # calculate the proportion of positive predictions for the minority class
    p_minority = data[data['label'] == minority_class]['prediction'].mean()
    # calculate the proportion of positive predictions for the majority class
    p_majority = data[data['label'] == majority_class]['prediction'].mean()
    # calculate DPPL
    DPPL = abs(p_minority - p_majority)
    
    print("DPPL: ", DPPL)
    

Model quality

  • Model quality in machine learning can be measured using various metrics depending on the type of problem you are trying to solve. Here are some common metrics for evaluating model quality:
    • Accuracy: This metric measures the percentage of correct predictions made by the model on the test data.
    • Precision: This metric measures the proportion of true positive predictions out of all the positive predictions made by the model.
    • Recall: This metric measures the proportion of true positive predictions out of all the actual positive instances in the test data.
    • F1 score: This metric is the harmonic mean of precision and recall and is used when both precision and recall are important.
    • Mean Squared Error (MSE): This metric is used to evaluate the performance of regression models. It measures the average of the squared differences between the predicted values and the actual values.
    • Mean Absolute Error (MAE): This metric is used to evaluate the performance of regression models. It measures the average of the absolute differences between the predicted values and the actual values.
    • Area Under the Receiver Operating Characteristic Curve (AUC-ROC): This metric is used to evaluate the performance of binary classification models. It measures the trade-off between the true positive rate and the false positive rate.
    • Mean Average Precision (mAP): This metric is used to evaluate the performance of object detection and image segmentation models. It measures the average precision across all classes.
  • When selecting a metric to measure model quality, consider the problem type, the desired outcome, and the specific needs of your project. It’s also important to keep in mind that a single metric may not be sufficient to evaluate the performance of a model, so it’s a good idea to consider multiple metrics when evaluating model quality.