Fit some representative binary classification models.

run_models(
  .data,
  target,
  positive,
  models = c("logistic", "rpart", "ctree", "randomForest", "ranger", "xgboost", "lasso")
)

Arguments

.data

A train_df. Train data to fit the model. It also supports tbl_df, tbl, and data.frame objects.

target

character. Name of target variable.

positive

character. Level of positive class of binary classification.

models

character. Algorithm types of model to fit. See details. default value is c("logistic", "rpart", "ctree", "randomForest", "ranger", "lasso").

Value

model_df. results of fitted model. model_df is composed of tbl_df and contains the following variables.:

  • step : character. The current stage in the model fit process. The result of calling run_models() is returned as "1.Fitted".

  • model_id : character. Type of fit model.

  • target : character. Name of target variable.

  • is_factor : logical. Indicates whether the target variable is a factor.

  • positive : character. Level of positive class of binary classification.

  • negative : character. Level of negative class of binary classification.

  • fitted_model : list. Fitted model object.

Details

Supported models are functions supported by the representative model package used in R environment. The following binary classifications are supported:

  • "logistic" : logistic regression by glm() in stats package.

  • "rpart" : recursive partitioning tree model by rpart() in rpart package.

  • "ctree" : conditional inference tree model by ctree() in party package.

  • "randomForest" : random forest model by randomForest() in randomForest package.

  • "ranger" : random forest model by ranger() in ranger package.

  • "xgboost" : XGBoosting model by xgboost() in xgboost package.

  • "lasso" : lasso model by glmnet() in glmnet package.

run_models() executes the process in parallel when fitting the model. However, it is not supported in MS-Windows operating system and RStudio environment.

Examples

library(dplyr)

# Divide the train data set and the test data set.
sb <- rpart::kyphosis %>%
  split_by(Kyphosis)

# Extract the train data set from original data set.
train <- sb %>%
  extract_set(set = "train")

# Extract the test data set from original data set.
test <- sb %>%
  extract_set(set = "test")

# Sampling for unbalanced data set using SMOTE(synthetic minority over-sampling technique).
train <- sb %>%
  sampling_target(seed = 1234L, method = "ubSMOTE")

# Cleaning the set.
train <- train %>%
  cleanse
#> ── Checking unique value ─────────────────────────── unique value is one ──
#> No variables that unique value is one.
#> 
#> ── Checking unique rate ─────────────────────────────── high unique rate ──
#> No variables that high unique rate.
#> 
#> ── Checking character variables ─────────────────────── categorical data ──
#> No character variables.
#> 
#> 

# Run the model fitting.
result <- run_models(.data = train, target = "Kyphosis", positive = "present")
result
#> # A tibble: 7 × 7
#>   step     model_id     target   is_factor positive negative fitted_model
#>   <chr>    <chr>        <chr>    <lgl>     <chr>    <chr>    <list>      
#> 1 1.Fitted logistic     Kyphosis TRUE      present  absent   <glm>       
#> 2 1.Fitted rpart        Kyphosis TRUE      present  absent   <rpart>     
#> 3 1.Fitted ctree        Kyphosis TRUE      present  absent   <BinaryTr>  
#> 4 1.Fitted randomForest Kyphosis TRUE      present  absent   <rndmFrs.>  
#> 5 1.Fitted ranger       Kyphosis TRUE      present  absent   <ranger>    
#> 6 1.Fitted xgboost      Kyphosis TRUE      present  absent   <xgb.Bstr>  
#> 7 1.Fitted lasso        Kyphosis TRUE      present  absent   <lognet>    

# Run the several kinds model fitting by dplyr
train %>%
  run_models(target = "Kyphosis", positive = "present")
#> # A tibble: 7 × 7
#>   step     model_id     target   is_factor positive negative fitted_model
#>   <chr>    <chr>        <chr>    <lgl>     <chr>    <chr>    <list>      
#> 1 1.Fitted logistic     Kyphosis TRUE      present  absent   <glm>       
#> 2 1.Fitted rpart        Kyphosis TRUE      present  absent   <rpart>     
#> 3 1.Fitted ctree        Kyphosis TRUE      present  absent   <BinaryTr>  
#> 4 1.Fitted randomForest Kyphosis TRUE      present  absent   <rndmFrs.>  
#> 5 1.Fitted ranger       Kyphosis TRUE      present  absent   <ranger>    
#> 6 1.Fitted xgboost      Kyphosis TRUE      present  absent   <xgb.Bstr>  
#> 7 1.Fitted lasso        Kyphosis TRUE      present  absent   <lognet>