Preface

The original data must be divided into train and test data sets to develop a classification model. You should do the following:

  • Cleansing the data set
  • Split the data into a train set and a test set
    • Split the data.frame or tbl_df into a train set and a test set
    • Compare data set
      • Comparison of categorical variables
      • Comparison of numeric variables
      • Diagnosis of train set and test set
    • Extract train/test data set
      • Extract train set or test set
      • Extract the data to fit the model
  • Modeling and Evaluate, Predict

The alookr package makes these steps fast and easy:

How to perform split the data

Refer to the following website for information on splitting the data into a train and test set.