The split_by() splits the data.frame or tbl_df into a train set and a test set.
split_by(.data, ...)
# S3 method for data.frame
split_by(.data, target, ratio = 0.7, seed = NULL, ...)
a data.frame or a tbl_df
.
further arguments passed to or from other methods.
unquoted expression or variable name. the name of the target variable
numeric. the ratio of the train dataset. default is 0.7
random seed used for splitting
An object of split_by.
The split_df class is created, which contains the split information and criteria to separate the training and the test set.
The attributes of the split_df class are as follows.:
split_seed : integer. random seed used for splitting
target : character. the name of the target variable
binary : logical. whether the target variable is binary class
minority : character. the name of the minority class
majority : character. the name of the majority class
minority_rate : numeric. the rate of the minority class
majority_rate : numeric. the rate of the majority class
library(dplyr)
# Credit Card Default Data
head(ISLR::Default)
#> default student balance income
#> 1 No No 729.5265 44361.625
#> 2 No Yes 817.1804 12106.135
#> 3 No No 1073.5492 31767.139
#> 4 No No 529.2506 35704.494
#> 5 No No 785.6559 38463.496
#> 6 No Yes 919.5885 7491.559
# Generate data for the example
sb <- ISLR::Default %>%
split_by(default)
sb
#> # A tibble: 10,000 × 5
#> # Groups: split_flag [2]
#> default student balance income split_flag
#> <fct> <fct> <dbl> <dbl> <chr>
#> 1 No No 730. 44362. train
#> 2 No Yes 817. 12106. train
#> 3 No No 1074. 31767. train
#> 4 No No 529. 35704. train
#> 5 No No 786. 38463. train
#> 6 No Yes 920. 7492. train
#> 7 No No 826. 24905. test
#> 8 No Yes 809. 17600. train
#> 9 No No 1161. 37469. test
#> 10 No No 0 29275. test
#> # ℹ 9,990 more rows