Fit binary classification model — run

Fit some representative binary classification models.

run_models(
  .data,
  target,
  positive,
  models = c("logistic", "rpart", "ctree", "randomForest", "ranger", "xgboost", "lasso")
)

Arguments

.data: A train_df. Train data to fit the model. It also supports tbl_df, tbl, and data.frame objects.
target: character. Name of target variable.
positive: character. Level of positive class of binary classification.
models: character. Algorithm types of model to fit. See details. default value is c("logistic", "rpart", "ctree", "randomForest", "ranger", "lasso").

Value

model_df. results of fitted model. model_df is composed of tbl_df and contains the following variables.:

step : character. The current stage in the model fit process. The result of calling run_models() is returned as "1.Fitted".
model_id : character. Type of fit model.
target : character. Name of target variable.
is_factor : logical. Indicates whether the target variable is a factor.
positive : character. Level of positive class of binary classification.
negative : character. Level of negative class of binary classification.
fitted_model : list. Fitted model object.

Details

Supported models are functions supported by the representative model package used in R environment. The following binary classifications are supported:

"logistic" : logistic regression by glm() in stats package.
"rpart" : recursive partitioning tree model by rpart() in rpart package.
"ctree" : conditional inference tree model by ctree() in party package.
"randomForest" : random forest model by randomForest() in randomForest package.
"ranger" : random forest model by ranger() in ranger package.
"xgboost" : XGBoosting model by xgboost() in xgboost package.
"lasso" : lasso model by glmnet() in glmnet package.

run_models() executes the process in parallel when fitting the model. However, it is not supported in MS-Windows operating system and RStudio environment.

Examples

library(dplyr)

# Divide the train data set and the test data set.
sb <- rpart::kyphosis %>%
  split_by(Kyphosis)

# Extract the train data set from original data set.
train <- sb %>%
  extract_set(set = "train")

# Extract the test data set from original data set.
test <- sb %>%
  extract_set(set = "test")

# Sampling for unbalanced data set using SMOTE(synthetic minority over-sampling technique).
train <- sb %>%
  sampling_target(seed = 1234L, method = "ubSMOTE")

# Cleaning the set.
train <- train %>%
  cleanse
#> ── Checking unique value ─────────────────────────── unique value is one ──
#> No variables that unique value is one.
#> 
#> ── Checking unique rate ─────────────────────────────── high unique rate ──
#> No variables that high unique rate.
#> 
#> ── Checking character variables ─────────────────────── categorical data ──
#> No character variables.
#> 
#> 

# Run the model fitting.
result <- run_models(.data = train, target = "Kyphosis", positive = "present")
result
#> # A tibble: 7 × 7
#>   step     model_id     target   is_factor positive negative fitted_model
#>   <chr>    <chr>        <chr>    <lgl>     <chr>    <chr>    <list>      
#> 1 1.Fitted logistic     Kyphosis TRUE      present  absent   <glm>       
#> 2 1.Fitted rpart        Kyphosis TRUE      present  absent   <rpart>     
#> 3 1.Fitted ctree        Kyphosis TRUE      present  absent   <BinaryTr>  
#> 4 1.Fitted randomForest Kyphosis TRUE      present  absent   <rndmFrs.>  
#> 5 1.Fitted ranger       Kyphosis TRUE      present  absent   <ranger>    
#> 6 1.Fitted xgboost      Kyphosis TRUE      present  absent   <xgb.Bstr>  
#> 7 1.Fitted lasso        Kyphosis TRUE      present  absent   <lognet>    

# Run the several kinds model fitting by dplyr
train %>%
  run_models(target = "Kyphosis", positive = "present")
#> # A tibble: 7 × 7
#>   step     model_id     target   is_factor positive negative fitted_model
#>   <chr>    <chr>        <chr>    <lgl>     <chr>    <chr>    <list>      
#> 1 1.Fitted logistic     Kyphosis TRUE      present  absent   <glm>       
#> 2 1.Fitted rpart        Kyphosis TRUE      present  absent   <rpart>     
#> 3 1.Fitted ctree        Kyphosis TRUE      present  absent   <BinaryTr>  
#> 4 1.Fitted randomForest Kyphosis TRUE      present  absent   <rndmFrs.>  
#> 5 1.Fitted ranger       Kyphosis TRUE      present  absent   <ranger>    
#> 6 1.Fitted xgboost      Kyphosis TRUE      present  absent   <xgb.Bstr>  
#> 7 1.Fitted lasso        Kyphosis TRUE      present  absent   <lognet>