Optimal Binning for Scoring Modeling

The binning_by() finding intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

binning_by(.data, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

.data: a data frame.
y: character. name of binary response variable(0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor having two levels, it is performed while type conversion is performed in the calculation process.
x: character. name of continuous characteristic variable. At least 5 different values. and Inf is not allowed.
p: numeric. percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).
ordered: logical. whether to build an ordered factor or not.
labels: character. the label names to use for each of the bins.

Value

an object of "optimal_bins" class. Attributes of "optimal_bins" class is as follows.

class : "optimal_bins".
type : binning type, "optimal".
breaks : numeric. the number of intervals into which x is to be cut.
levels : character. levels of binned value.
raw : numeric. raw data, x argument value.
ivtable : data.frame. information value table.
iv : numeric. information value.
target : integer. binary response variable.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

attributes of "optimal_bins" class

Attributes of the "optimal_bins" class that is as follows.

class : "optimal_bins".
levels : character. factor or ordered factor levels
type : character. binning method
breaks : numeric. breaks for binning
raw : numeric. before the binned the raw data
ivtable : data.frame. information value table
iv : numeric. information value
target : integer. binary response variable

See vignette("transformation") for an introduction to these concepts.

Examples

library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

# optimal binning using character
bin <- binning_by(heartfailure2, "death_event", "creatinine")
#> Warning: The factor y has been changed to a numeric vector consisting of 0 and 1.
#> 'Yes' changed to 1 (positive) and 'No' changed to 0 (negative).

# optimal binning using name
bin <- binning_by(heartfailure2, death_event, creatinine)
#> Warning: The factor y has been changed to a numeric vector consisting of 0 and 1.
#> 'Yes' changed to 1 (positive) and 'No' changed to 0 (negative).
bin
#> binned type: optimal
#> number of bins: 3
#> x
#> [0.5,0.9] (0.9,1.8] (1.8,9.4]      <NA> 
#>        78       168        48         5

Arguments

Value

Details

attributes of "optimal_bins" class

See also

Examples