Fit some representative binary classification models.
run_models(
.data,
target,
positive,
models = c("logistic", "rpart", "ctree", "randomForest", "ranger", "xgboost", "lasso")
)
A train_df. Train data to fit the model. It also supports tbl_df, tbl, and data.frame objects.
character. Name of target variable.
character. Level of positive class of binary classification.
character. Algorithm types of model to fit. See details. default value is c("logistic", "rpart", "ctree", "randomForest", "ranger", "lasso").
model_df. results of fitted model. model_df is composed of tbl_df and contains the following variables.:
step : character. The current stage in the model fit process. The result of calling run_models() is returned as "1.Fitted".
model_id : character. Type of fit model.
target : character. Name of target variable.
is_factor : logical. Indicates whether the target variable is a factor.
positive : character. Level of positive class of binary classification.
negative : character. Level of negative class of binary classification.
fitted_model : list. Fitted model object.
Supported models are functions supported by the representative model package used in R environment. The following binary classifications are supported:
"logistic" : logistic regression by glm() in stats package.
"rpart" : recursive partitioning tree model by rpart() in rpart package.
"ctree" : conditional inference tree model by ctree() in party package.
"randomForest" : random forest model by randomForest() in randomForest package.
"ranger" : random forest model by ranger() in ranger package.
"xgboost" : XGBoosting model by xgboost() in xgboost package.
"lasso" : lasso model by glmnet() in glmnet package.
run_models() executes the process in parallel when fitting the model. However, it is not supported in MS-Windows operating system and RStudio environment.
library(dplyr)
# Divide the train data set and the test data set.
sb <- rpart::kyphosis %>%
split_by(Kyphosis)
# Extract the train data set from original data set.
train <- sb %>%
extract_set(set = "train")
# Extract the test data set from original data set.
test <- sb %>%
extract_set(set = "test")
# Sampling for unbalanced data set using SMOTE(synthetic minority over-sampling technique).
train <- sb %>%
sampling_target(seed = 1234L, method = "ubSMOTE")
# Cleaning the set.
train <- train %>%
cleanse
#> ── Checking unique value ─────────────────────────── unique value is one ──
#> No variables that unique value is one.
#>
#> ── Checking unique rate ─────────────────────────────── high unique rate ──
#> No variables that high unique rate.
#>
#> ── Checking character variables ─────────────────────── categorical data ──
#> No character variables.
#>
#>
# Run the model fitting.
result <- run_models(.data = train, target = "Kyphosis", positive = "present")
result
#> # A tibble: 7 × 7
#> step model_id target is_factor positive negative fitted_model
#> <chr> <chr> <chr> <lgl> <chr> <chr> <list>
#> 1 1.Fitted logistic Kyphosis TRUE present absent <glm>
#> 2 1.Fitted rpart Kyphosis TRUE present absent <rpart>
#> 3 1.Fitted ctree Kyphosis TRUE present absent <BinaryTr>
#> 4 1.Fitted randomForest Kyphosis TRUE present absent <rndmFrs.>
#> 5 1.Fitted ranger Kyphosis TRUE present absent <ranger>
#> 6 1.Fitted xgboost Kyphosis TRUE present absent <xgb.Bstr>
#> 7 1.Fitted lasso Kyphosis TRUE present absent <lognet>
# Run the several kinds model fitting by dplyr
train %>%
run_models(target = "Kyphosis", positive = "present")
#> # A tibble: 7 × 7
#> step model_id target is_factor positive negative fitted_model
#> <chr> <chr> <chr> <lgl> <chr> <chr> <list>
#> 1 1.Fitted logistic Kyphosis TRUE present absent <glm>
#> 2 1.Fitted rpart Kyphosis TRUE present absent <rpart>
#> 3 1.Fitted ctree Kyphosis TRUE present absent <BinaryTr>
#> 4 1.Fitted randomForest Kyphosis TRUE present absent <rndmFrs.>
#> 5 1.Fitted ranger Kyphosis TRUE present absent <ranger>
#> 6 1.Fitted xgboost Kyphosis TRUE present absent <xgb.Bstr>
#> 7 1.Fitted lasso Kyphosis TRUE present absent <lognet>