Calculate metrics for model evaluation — performance

Calculate some representative metrics for binary classification model evaluation.

performance_metric(
  pred,
  actual,
  positive,
  metric = c("ZeroOneLoss", "Accuracy", "Precision", "Recall", "Sensitivity",
    "Specificity", "F1_Score", "Fbeta_Score", "LogLoss", "AUC", "Gini", "PRAUC",
    "LiftAUC", "GainAUC", "KS_Stat", "ConfusionMatrix"),
  cutoff = 0.5,
  beta = 1
)

Arguments

pred: numeric. Probability values that predicts the positive class of the target variable.
actual: factor. The value of the actual target variable.
positive: character. Level of positive class of binary classification.
metric: character. The performance metrics you want to calculate. See details.
cutoff: numeric. Threshold for classifying predicted probability values into positive and negative classes.
beta: numeric. Weight of precision in harmonic mean for F-Beta Score.

Value

numeric or table object. Confusion Matrix return by table object. and otherwise is numeric.: The performance metrics calculated are as follows.:

ZeroOneLoss : Normalized Zero-One Loss(Classification Error Loss).
Accuracy : Accuracy.
Precision : Precision.
Recall : Recall.
Sensitivity : Sensitivity.
Specificity : Specificity.
F1_Score : F1 Score.
Fbeta_Score : F-Beta Score.
LogLoss : Log loss / Cross-Entropy Loss.
AUC : Area Under the Receiver Operating Characteristic Curve (ROC AUC).
Gini : Gini Coefficient.
PRAUC : Area Under the Precision-Recall Curve (PR AUC).
LiftAUC : Area Under the Lift Chart.
GainAUC : Area Under the Gain Chart.
KS_Stat : Kolmogorov-Smirnov Statistic.
ConfusionMatrix : Confusion Matrix.

Details

The cutoff argument applies only if the metric argument is "ZeroOneLoss", "Accuracy", "Precision", "Recall", "Sensitivity", "Specificity", "F1_Score", "Fbeta_Score", "ConfusionMatrix".

Examples

library(dplyr)

# Divide the train data set and the test data set.
sb <- rpart::kyphosis %>%
  split_by(Kyphosis)

# Extract the train data set from original data set.
train <- sb %>%
  extract_set(set = "train")

# Extract the test data set from original data set.
test <- sb %>%
  extract_set(set = "test")

# Sampling for unbalanced data set using SMOTE(synthetic minority over-sampling technique).
train <- sb %>%
  sampling_target(seed = 1234L, method = "ubSMOTE")

# Cleaning the set.
train <- train %>%
  cleanse
#> ── Checking unique value ─────────────────────────── unique value is one ──
#> No variables that unique value is one.
#> 
#> ── Checking unique rate ─────────────────────────────── high unique rate ──
#> No variables that high unique rate.
#> 
#> ── Checking character variables ─────────────────────── categorical data ──
#> No character variables.
#> 
#> 

# Run the model fitting.
result <- run_models(.data = train, target = "Kyphosis", positive = "present")
result
#> # A tibble: 7 × 7
#>   step     model_id     target   is_factor positive negative fitted_model
#>   <chr>    <chr>        <chr>    <lgl>     <chr>    <chr>    <list>      
#> 1 1.Fitted logistic     Kyphosis TRUE      present  absent   <glm>       
#> 2 1.Fitted rpart        Kyphosis TRUE      present  absent   <rpart>     
#> 3 1.Fitted ctree        Kyphosis TRUE      present  absent   <BinaryTr>  
#> 4 1.Fitted randomForest Kyphosis TRUE      present  absent   <rndmFrs.>  
#> 5 1.Fitted ranger       Kyphosis TRUE      present  absent   <ranger>    
#> 6 1.Fitted xgboost      Kyphosis TRUE      present  absent   <xgb.Bstr>  
#> 7 1.Fitted lasso        Kyphosis TRUE      present  absent   <lognet>    

# Predict the model.
pred <- run_predict(result, test)
pred
#> # A tibble: 7 × 8
#>   step       model_id target is_factor positive negative fitted_model predicted 
#>   <chr>      <chr>    <chr>  <lgl>     <chr>    <chr>    <list>       <list>    
#> 1 2.Predict… logistic Kypho… TRUE      present  absent   <glm>        <prdct_cl>
#> 2 2.Predict… rpart    Kypho… TRUE      present  absent   <rpart>      <prdct_cl>
#> 3 2.Predict… ctree    Kypho… TRUE      present  absent   <BinaryTr>   <prdct_cl>
#> 4 2.Predict… randomF… Kypho… TRUE      present  absent   <rndmFrs.>   <prdct_cl>
#> 5 2.Predict… ranger   Kypho… TRUE      present  absent   <ranger>     <prdct_cl>
#> 6 2.Predict… xgboost  Kypho… TRUE      present  absent   <xgb.Bstr>   <prdct_cl>
#> 7 2.Predict… lasso    Kypho… TRUE      present  absent   <lognet>     <prdct_cl>

# Calculate Accuracy.
performance_metric(attr(pred$predicted[[1]], "pred_prob"), test$Kyphosis,
  "present", "Accuracy")
#> [1] 0.5833333
# Calculate Confusion Matrix.
performance_metric(attr(pred$predicted[[1]], "pred_prob"), test$Kyphosis,
  "present", "ConfusionMatrix")
#>          actual
#> predict   absent present
#>   absent       9       0
#>   present     10       5
# Calculate Confusion Matrix by cutoff = 0.55.
performance_metric(attr(pred$predicted[[1]], "pred_prob"), test$Kyphosis,
  "present", "ConfusionMatrix", cutoff = 0.55)
#>          actual
#> predict   absent present
#>   absent       9       0
#>   present     10       5