Split Data into Train and Test Set — split

The split_by() splits the data.frame or tbl_df into a train set and a test set.

split_by(.data, ...)

# S3 method for data.frame
split_by(.data, target, ratio = 0.7, seed = NULL, ...)

Arguments

.data: a data.frame or a tbl_df.
...: further arguments passed to or from other methods.
target: unquoted expression or variable name. the name of the target variable
ratio: numeric. the ratio of the train dataset. default is 0.7
seed: random seed used for splitting

Value

An object of split_by.

Details

The split_df class is created, which contains the split information and criteria to separate the training and the test set.

attributes of split_by

The attributes of the split_df class are as follows.:

split_seed : integer. random seed used for splitting
target : character. the name of the target variable
binary : logical. whether the target variable is binary class
minority : character. the name of the minority class
majority : character. the name of the majority class
minority_rate : numeric. the rate of the minority class
majority_rate : numeric. the rate of the majority class

Examples

library(dplyr)

# Credit Card Default Data
head(ISLR::Default)
#>   default student   balance    income
#> 1      No      No  729.5265 44361.625
#> 2      No     Yes  817.1804 12106.135
#> 3      No      No 1073.5492 31767.139
#> 4      No      No  529.2506 35704.494
#> 5      No      No  785.6559 38463.496
#> 6      No     Yes  919.5885  7491.559

# Generate data for the example
sb <- ISLR::Default %>%
  split_by(default)

sb
#> # A tibble: 10,000 × 5
#> # Groups:   split_flag [2]
#>    default student balance income split_flag
#>    <fct>   <fct>     <dbl>  <dbl> <chr>     
#>  1 No      No         730. 44362. train     
#>  2 No      Yes        817. 12106. train     
#>  3 No      No        1074. 31767. train     
#>  4 No      No         529. 35704. train     
#>  5 No      No         786. 38463. train     
#>  6 No      Yes        920.  7492. train     
#>  7 No      No         826. 24905. test      
#>  8 No      Yes        809. 17600. train     
#>  9 No      No        1161. 37469. test      
#> 10 No      No           0  29275. test      
#> # ℹ 9,990 more rows