The pps() compute PPS(Predictive Power Score) for exploratory data analysis.
pps(.data, ...)
# S3 method for data.frame
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)
# S3 method for target_df
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)
a target_df or data.frame.
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, describe() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.
integer. number of cross-validation folds.
logical. whether to perform score calls in parallel.
integer. number of cores to use, defaults to maximum cores - 1.
An object of the class as pps. Attributes of pps class is as follows.
type : type of pps
target : name of target variable
predictor : name of predictor
The PPS is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power).
The information of PPS is as follows.
x : the name of the predictor variable
y : the name of the target variable
result_type : text showing how to interpret the resulting score
pps : the predictive power score
metric : the evaluation metric used to compute the PPS
baseline_score : the score of a naive model on the evaluation metric
model_score : the score of the predictive model on the evaluation metric
cv_folds : how many cross-validation folds were used
seed : the seed that was set
algorithm : text shwoing what algorithm was used
model_type : text showing whether classification or regression was used
RIP correlation. Introducing the Predictive Power Score - by Florian Wetschoreck
https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598
# \donttest{
library(dplyr)
# pps type is generic =======================================
pps_generic <- pps(iris)
pps_generic
#> x y result_type pps
#> 1 Sepal.Length Sepal.Length predictor and target are the same 1.00000000
#> 2 Sepal.Width Sepal.Length predictive power score 0.05700280
#> 3 Petal.Length Sepal.Length predictive power score 0.52848518
#> 4 Petal.Width Sepal.Length predictive power score 0.43360037
#> 5 Species Sepal.Length predictive power score 0.40586730
#> 6 Sepal.Length Sepal.Width predictive power score 0.09404167
#> 7 Sepal.Width Sepal.Width predictor and target are the same 1.00000000
#> 8 Petal.Length Sepal.Width predictive power score 0.25083405
#> 9 Petal.Width Sepal.Width predictive power score 0.24666074
#> 10 Species Sepal.Width predictive power score 0.21699768
#> 11 Sepal.Length Petal.Length predictive power score 0.61917579
#> 12 Sepal.Width Petal.Length predictive power score 0.17961947
#> 13 Petal.Length Petal.Length predictor and target are the same 1.00000000
#> 14 Petal.Width Petal.Length predictive power score 0.78151887
#> 15 Species Petal.Length predictive power score 0.79350708
#> 16 Sepal.Length Petal.Width predictive power score 0.48754118
#> 17 Sepal.Width Petal.Width predictive power score 0.14674017
#> 18 Petal.Length Petal.Width predictive power score 0.74123250
#> 19 Petal.Width Petal.Width predictor and target are the same 1.00000000
#> 20 Species Petal.Width predictive power score 0.75318927
#> 21 Sepal.Length Species predictive power score 0.60444796
#> 22 Sepal.Width Species predictive power score 0.36017909
#> 23 Petal.Length Species predictive power score 0.91595246
#> 24 Petal.Width Species predictive power score 0.93554935
#> 25 Species Species predictor and target are the same 1.00000000
#> metric baseline_score model_score cv_folds seed algorithm
#> 1 <NA> NA NA NA NA <NA>
#> 2 MAE 0.6874444 0.6625728 5 1 tree
#> 3 MAE 0.6874444 0.3219946 5 1 tree
#> 4 MAE 0.6874444 0.3869010 5 1 tree
#> 5 MAE 0.6874444 0.4069932 5 1 tree
#> 6 MAE 0.3398667 0.3117519 5 1 tree
#> 7 <NA> NA NA NA NA <NA>
#> 8 MAE 0.3398667 0.2549879 5 1 tree
#> 9 MAE 0.3398667 0.2553413 5 1 tree
#> 10 MAE 0.3398667 0.2671294 5 1 tree
#> 11 MAE 1.5653222 0.5901695 5 1 tree
#> 12 MAE 1.5653222 1.2810047 5 1 tree
#> 13 <NA> NA NA NA NA <NA>
#> 14 MAE 1.5653222 0.3352507 5 1 tree
#> 15 MAE 1.5653222 0.3202118 5 1 tree
#> 16 MAE 0.6587222 0.3298965 5 1 tree
#> 17 MAE 0.6587222 0.5623788 5 1 tree
#> 18 MAE 0.6587222 0.1677181 5 1 tree
#> 19 <NA> NA NA NA NA <NA>
#> 20 MAE 0.6587222 0.1589949 5 1 tree
#> 21 F1_weighted 0.2506355 0.7053681 5 1 tree
#> 22 F1_weighted 0.2506355 0.5209557 5 1 tree
#> 23 F1_weighted 0.2506355 0.9396379 5 1 tree
#> 24 F1_weighted 0.2506355 0.9524872 5 1 tree
#> 25 <NA> NA NA NA NA <NA>
#> model_type
#> 1 <NA>
#> 2 regression
#> 3 regression
#> 4 regression
#> 5 regression
#> 6 regression
#> 7 <NA>
#> 8 regression
#> 9 regression
#> 10 regression
#> 11 regression
#> 12 regression
#> 13 <NA>
#> 14 regression
#> 15 regression
#> 16 regression
#> 17 regression
#> 18 regression
#> 19 <NA>
#> 20 regression
#> 21 classification
#> 22 classification
#> 23 classification
#> 24 classification
#> 25 <NA>
# summary pps class
mat <- summary(pps_generic)
#> * PPS type : generic
#> * Matrix of Predictive Power Score
#> - Columns : target
#> - Rows : predictors
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> Sepal.Length 1.00000000 0.0570028 0.5284852 0.4336004 0.4058673
#> Sepal.Width 0.09404167 1.0000000 0.2508341 0.2466607 0.2169977
#> Petal.Length 0.61917579 0.1796195 1.0000000 0.7815189 0.7935071
#> Petal.Width 0.48754118 0.1467402 0.7412325 1.0000000 0.7531893
#> Species 0.60444796 0.3601791 0.9159525 0.9355494 1.0000000
mat
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> Sepal.Length 1.00000000 0.0570028 0.5284852 0.4336004 0.4058673
#> Sepal.Width 0.09404167 1.0000000 0.2508341 0.2466607 0.2169977
#> Petal.Length 0.61917579 0.1796195 1.0000000 0.7815189 0.7935071
#> Petal.Width 0.48754118 0.1467402 0.7412325 1.0000000 0.7531893
#> Species 0.60444796 0.3601791 0.9159525 0.9355494 1.0000000
# visualize pps class
plot(pps_generic)
# pps type is target_by =====================================
##-----------------------------------------------------------
# If the target variable is a categorical variable
categ <- target_by(iris, Species)
# compute all variables
pps_cat <- pps(categ)
pps_cat
#> x y result_type pps metric
#> 1 Sepal.Length Species predictive power score 0.6044480 F1_weighted
#> 2 Sepal.Width Species predictive power score 0.3601791 F1_weighted
#> 3 Petal.Length Species predictive power score 0.9159525 F1_weighted
#> 4 Petal.Width Species predictive power score 0.9355494 F1_weighted
#> 5 Species Species predictor and target are the same 1.0000000 <NA>
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 0.2506355 0.7053681 5 1 tree classification
#> 2 0.2506355 0.5209557 5 1 tree classification
#> 3 0.2506355 0.9396379 5 1 tree classification
#> 4 0.2506355 0.9524872 5 1 tree classification
#> 5 NA NA NA NA <NA> <NA>
# compute Petal.Length and Petal.Width variable
pps_cat <- pps(categ, Petal.Length, Petal.Width)
pps_cat
#> x y result_type pps metric
#> 1 Petal.Length Species predictive power score 0.9159525 F1_weighted
#> 2 Petal.Width Species predictive power score 0.9355494 F1_weighted
#> 3 Species Species predictor and target are the same 1.0000000 <NA>
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 0.2506355 0.9396379 5 1 tree classification
#> 2 0.2506355 0.9524872 5 1 tree classification
#> 3 NA NA NA NA <NA> <NA>
# Using dplyr
pps_cat <- iris %>%
target_by(Species) %>%
pps()
pps_cat
#> x y result_type pps metric
#> 1 Sepal.Length Species predictive power score 0.6044480 F1_weighted
#> 2 Sepal.Width Species predictive power score 0.3601791 F1_weighted
#> 3 Petal.Length Species predictive power score 0.9159525 F1_weighted
#> 4 Petal.Width Species predictive power score 0.9355494 F1_weighted
#> 5 Species Species predictor and target are the same 1.0000000 <NA>
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 0.2506355 0.7053681 5 1 tree classification
#> 2 0.2506355 0.5209557 5 1 tree classification
#> 3 0.2506355 0.9396379 5 1 tree classification
#> 4 0.2506355 0.9524872 5 1 tree classification
#> 5 NA NA NA NA <NA> <NA>
# Using parallel process
# pps_cat <- iris %>%
# target_by(Species) %>%
# pps(do_parallel = TRUE)
#
# pps_cat
# summary pps class
tab <- summary(pps_cat)
#> * PPS type : target_by
#> * Target variable : Species
#> * Model type : classification
#> * Information of Predictive Power Score
#> predictors target pps
#> 1 Species Species 1.0000000
#> 2 Petal.Width Species 0.9355494
#> 3 Petal.Length Species 0.9159525
#> 4 Sepal.Length Species 0.6044480
#> 5 Sepal.Width Species 0.3601791
tab
#> predictors target pps
#> 1 Species Species 1.0000000
#> 2 Petal.Width Species 0.9355494
#> 3 Petal.Length Species 0.9159525
#> 4 Sepal.Length Species 0.6044480
#> 5 Sepal.Width Species 0.3601791
# visualize pps class
plot(pps_cat)
##-----------------------------------------------------------
# If the target variable is a numerical variable
num <- target_by(iris, Petal.Length)
pps_num <- pps(num)
pps_num
#> x y result_type pps metric
#> 1 Sepal.Length Petal.Length predictive power score 0.6191758 MAE
#> 2 Sepal.Width Petal.Length predictive power score 0.1796195 MAE
#> 3 Petal.Length Petal.Length predictor and target are the same 1.0000000 <NA>
#> 4 Petal.Width Petal.Length predictive power score 0.7815189 MAE
#> 5 Species Petal.Length predictive power score 0.7935071 MAE
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 1.565322 0.5901695 5 1 tree regression
#> 2 1.565322 1.2810047 5 1 tree regression
#> 3 NA NA NA NA <NA> <NA>
#> 4 1.565322 0.3352507 5 1 tree regression
#> 5 1.565322 0.3202118 5 1 tree regression
# summary pps class
tab <- summary(pps_num)
#> * PPS type : target_by
#> * Target variable : Petal.Length
#> * Model type : regression
#> * Information of Predictive Power Score
#> predictors target pps
#> 1 Petal.Length Petal.Length 1.0000000
#> 2 Species Petal.Length 0.7935071
#> 3 Petal.Width Petal.Length 0.7815189
#> 4 Sepal.Length Petal.Length 0.6191758
#> 5 Sepal.Width Petal.Length 0.1796195
tab
#> predictors target pps
#> 1 Petal.Length Petal.Length 1.0000000
#> 2 Species Petal.Length 0.7935071
#> 3 Petal.Width Petal.Length 0.7815189
#> 4 Sepal.Length Petal.Length 0.6191758
#> 5 Sepal.Width Petal.Length 0.1796195
# plot pps class
plot(pps_num)
# }