Preface

If you created a data set to develop a classification model, you must perform a cleansing of the data. After you create the data set, you should do the following:

  • Cleansing the data set
    • Optional removal of variables, including missing values
    • Remove a variable with one unique number
    • Remove categorical variables with a large number of levels
    • Convert a character variable to a categorical variable
  • Split the data into a train set and a test set
  • Modeling and Evaluate, Predict

The alookr package makes these steps fast and easy:

How to perform cleansing the data set

Refer to the following website for information on how to perform cleansing the data set.