Calculate some representative metrics for binary classification model evaluation.
performance_metric(
pred,
actual,
positive,
metric = c("ZeroOneLoss", "Accuracy", "Precision", "Recall", "Sensitivity",
"Specificity", "F1_Score", "Fbeta_Score", "LogLoss", "AUC", "Gini", "PRAUC",
"LiftAUC", "GainAUC", "KS_Stat", "ConfusionMatrix"),
cutoff = 0.5,
beta = 1
)
numeric. Probability values that predicts the positive class of the target variable.
factor. The value of the actual target variable.
character. Level of positive class of binary classification.
character. The performance metrics you want to calculate. See details.
numeric. Threshold for classifying predicted probability values into positive and negative classes.
numeric. Weight of precision in harmonic mean for F-Beta Score.
numeric or table object. Confusion Matrix return by table object. and otherwise is numeric.: The performance metrics calculated are as follows.:
ZeroOneLoss : Normalized Zero-One Loss(Classification Error Loss).
Accuracy : Accuracy.
Precision : Precision.
Recall : Recall.
Sensitivity : Sensitivity.
Specificity : Specificity.
F1_Score : F1 Score.
Fbeta_Score : F-Beta Score.
LogLoss : Log loss / Cross-Entropy Loss.
AUC : Area Under the Receiver Operating Characteristic Curve (ROC AUC).
Gini : Gini Coefficient.
PRAUC : Area Under the Precision-Recall Curve (PR AUC).
LiftAUC : Area Under the Lift Chart.
GainAUC : Area Under the Gain Chart.
KS_Stat : Kolmogorov-Smirnov Statistic.
ConfusionMatrix : Confusion Matrix.
The cutoff argument applies only if the metric argument is "ZeroOneLoss", "Accuracy", "Precision", "Recall", "Sensitivity", "Specificity", "F1_Score", "Fbeta_Score", "ConfusionMatrix".
library(dplyr)
# Divide the train data set and the test data set.
sb <- rpart::kyphosis %>%
split_by(Kyphosis)
# Extract the train data set from original data set.
train <- sb %>%
extract_set(set = "train")
# Extract the test data set from original data set.
test <- sb %>%
extract_set(set = "test")
# Sampling for unbalanced data set using SMOTE(synthetic minority over-sampling technique).
train <- sb %>%
sampling_target(seed = 1234L, method = "ubSMOTE")
# Cleaning the set.
train <- train %>%
cleanse
#> ── Checking unique value ─────────────────────────── unique value is one ──
#> No variables that unique value is one.
#>
#> ── Checking unique rate ─────────────────────────────── high unique rate ──
#> No variables that high unique rate.
#>
#> ── Checking character variables ─────────────────────── categorical data ──
#> No character variables.
#>
#>
# Run the model fitting.
result <- run_models(.data = train, target = "Kyphosis", positive = "present")
result
#> # A tibble: 7 × 7
#> step model_id target is_factor positive negative fitted_model
#> <chr> <chr> <chr> <lgl> <chr> <chr> <list>
#> 1 1.Fitted logistic Kyphosis TRUE present absent <glm>
#> 2 1.Fitted rpart Kyphosis TRUE present absent <rpart>
#> 3 1.Fitted ctree Kyphosis TRUE present absent <BinaryTr>
#> 4 1.Fitted randomForest Kyphosis TRUE present absent <rndmFrs.>
#> 5 1.Fitted ranger Kyphosis TRUE present absent <ranger>
#> 6 1.Fitted xgboost Kyphosis TRUE present absent <xgb.Bstr>
#> 7 1.Fitted lasso Kyphosis TRUE present absent <lognet>
# Predict the model.
pred <- run_predict(result, test)
pred
#> # A tibble: 7 × 8
#> step model_id target is_factor positive negative fitted_model predicted
#> <chr> <chr> <chr> <lgl> <chr> <chr> <list> <list>
#> 1 2.Predict… logistic Kypho… TRUE present absent <glm> <prdct_cl>
#> 2 2.Predict… rpart Kypho… TRUE present absent <rpart> <prdct_cl>
#> 3 2.Predict… ctree Kypho… TRUE present absent <BinaryTr> <prdct_cl>
#> 4 2.Predict… randomF… Kypho… TRUE present absent <rndmFrs.> <prdct_cl>
#> 5 2.Predict… ranger Kypho… TRUE present absent <ranger> <prdct_cl>
#> 6 2.Predict… xgboost Kypho… TRUE present absent <xgb.Bstr> <prdct_cl>
#> 7 2.Predict… lasso Kypho… TRUE present absent <lognet> <prdct_cl>
# Calculate Accuracy.
performance_metric(attr(pred$predicted[[1]], "pred_prob"), test$Kyphosis,
"present", "Accuracy")
#> [1] 0.5833333
# Calculate Confusion Matrix.
performance_metric(attr(pred$predicted[[1]], "pred_prob"), test$Kyphosis,
"present", "ConfusionMatrix")
#> actual
#> predict absent present
#> absent 9 0
#> present 10 5
# Calculate Confusion Matrix by cutoff = 0.55.
performance_metric(attr(pred$predicted[[1]], "pred_prob"), test$Kyphosis,
"present", "ConfusionMatrix", cutoff = 0.55)
#> actual
#> predict absent present
#> absent 9 0
#> present 10 5