The binning_by() finding intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

binning_by(.data, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

.data

a data frame.

y

character. name of binary response variable(0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor having two levels, it is performed while type conversion is performed in the calculation process.

x

character. name of continuous characteristic variable. At least 5 different values. and Inf is not allowed.

p

numeric. percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

ordered

logical. whether to build an ordered factor or not.

labels

character. the label names to use for each of the bins.

Value

an object of "optimal_bins" class. Attributes of "optimal_bins" class is as follows.

  • class : "optimal_bins".

  • type : binning type, "optimal".

  • breaks : numeric. the number of intervals into which x is to be cut.

  • levels : character. levels of binned value.

  • raw : numeric. raw data, x argument value.

  • ivtable : data.frame. information value table.

  • iv : numeric. information value.

  • target : integer. binary response variable.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

attributes of "optimal_bins" class

Attributes of the "optimal_bins" class that is as follows.

  • class : "optimal_bins".

  • levels : character. factor or ordered factor levels

  • type : character. binning method

  • breaks : numeric. breaks for binning

  • raw : numeric. before the binned the raw data

  • ivtable : data.frame. information value table

  • iv : numeric. information value

  • target : integer. binary response variable

See vignette("transformation") for an introduction to these concepts.

Examples

library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

# optimal binning using character
bin <- binning_by(heartfailure2, "death_event", "creatinine")
#> Warning: The factor y has been changed to a numeric vector consisting of 0 and 1.
#> 'Yes' changed to 1 (positive) and 'No' changed to 0 (negative).

# optimal binning using name
bin <- binning_by(heartfailure2, death_event, creatinine)
#> Warning: The factor y has been changed to a numeric vector consisting of 0 and 1.
#> 'Yes' changed to 1 (positive) and 'No' changed to 0 (negative).
bin
#> binned type: optimal
#> number of bins: 3
#> x
#> [0.5,0.9] (0.9,1.8] (1.8,9.4]      <NA> 
#>        78       168        48         5