The pps() compute PPS(Predictive Power Score) for exploratory data analysis.
pps(.data, ...)
# S3 method for data.frame
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)
# S3 method for target_df
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)
a target_df or data.frame.
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, describe() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.
integer. number of cross-validation folds.
logical. whether to perform score calls in parallel.
integer. number of cores to use, defaults to maximum cores - 1.
An object of the class as pps. Attributes of pps class is as follows.
type : type of pps
target : name of target variable
predictor : name of predictor
The PPS is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power).
The information of PPS is as follows.
x : the name of the predictor variable
y : the name of the target variable
result_type : text showing how to interpret the resulting score
pps : the predictive power score
metric : the evaluation metric used to compute the PPS
baseline_score : the score of a naive model on the evaluation metric
model_score : the score of the predictive model on the evaluation metric
cv_folds : how many cross-validation folds were used
seed : the seed that was set
algorithm : text shwoing what algorithm was used
model_type : text showing whether classification or regression was used
RIP correlation. Introducing the Predictive Power Score - by Florian Wetschoreck
https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598
library(dplyr)
# If you want to use this feature, you need to install the 'ppsr' package.
if (!requireNamespace("ppsr", quietly = TRUE)) {
cat("If you want to use this feature, you need to install the 'ppsr' package.\n")
}
# pps type is generic =======================================
pps_generic <- pps(iris)
pps_generic
#> x y result_type pps
#> 1 Sepal.Length Sepal.Length predictor and target are the same 1.00000000
#> 2 Sepal.Width Sepal.Length predictive power score 0.04632352
#> 3 Petal.Length Sepal.Length predictive power score 0.54913985
#> 4 Petal.Width Sepal.Length predictive power score 0.41276679
#> 5 Species Sepal.Length predictive power score 0.40754872
#> 6 Sepal.Length Sepal.Width predictive power score 0.06790301
#> 7 Sepal.Width Sepal.Width predictor and target are the same 1.00000000
#> 8 Petal.Length Sepal.Width predictive power score 0.23769911
#> 9 Petal.Width Sepal.Width predictive power score 0.21746588
#> 10 Species Sepal.Width predictive power score 0.20128762
#> 11 Sepal.Length Petal.Length predictive power score 0.61608360
#> 12 Sepal.Width Petal.Length predictive power score 0.24263851
#> 13 Petal.Length Petal.Length predictor and target are the same 1.00000000
#> 14 Petal.Width Petal.Length predictive power score 0.79175121
#> 15 Species Petal.Length predictive power score 0.79049070
#> 16 Sepal.Length Petal.Width predictive power score 0.48735314
#> 17 Sepal.Width Petal.Width predictive power score 0.20124105
#> 18 Petal.Length Petal.Width predictive power score 0.74378445
#> 19 Petal.Width Petal.Width predictor and target are the same 1.00000000
#> 20 Species Petal.Width predictive power score 0.75611126
#> 21 Sepal.Length Species predictive power score 0.55918638
#> 22 Sepal.Width Species predictive power score 0.31344008
#> 23 Petal.Length Species predictive power score 0.91675800
#> 24 Petal.Width Species predictive power score 0.93985320
#> 25 Species Species predictor and target are the same 1.00000000
#> metric baseline_score model_score cv_folds seed algorithm
#> 1 <NA> NA NA NA NA <NA>
#> 2 MAE 0.6893222 0.6620058 5 1 tree
#> 3 MAE 0.6893222 0.3100867 5 1 tree
#> 4 MAE 0.6893222 0.4040123 5 1 tree
#> 5 MAE 0.6893222 0.4076661 5 1 tree
#> 6 MAE 0.3372222 0.3184796 5 1 tree
#> 7 <NA> NA NA NA NA <NA>
#> 8 MAE 0.3372222 0.2564258 5 1 tree
#> 9 MAE 0.3372222 0.2631636 5 1 tree
#> 10 MAE 0.3372222 0.2677963 5 1 tree
#> 11 MAE 1.5719667 0.5971445 5 1 tree
#> 12 MAE 1.5719667 1.1945031 5 1 tree
#> 13 <NA> NA NA NA NA <NA>
#> 14 MAE 1.5719667 0.3265152 5 1 tree
#> 15 MAE 1.5719667 0.3280552 5 1 tree
#> 16 MAE 0.6623556 0.3377682 5 1 tree
#> 17 MAE 0.6623556 0.5315834 5 1 tree
#> 18 MAE 0.6623556 0.1684906 5 1 tree
#> 19 <NA> NA NA NA NA <NA>
#> 20 MAE 0.6623556 0.1608119 5 1 tree
#> 21 F1_weighted 0.3176487 0.7028029 5 1 tree
#> 22 F1_weighted 0.3176487 0.5377587 5 1 tree
#> 23 F1_weighted 0.3176487 0.9404972 5 1 tree
#> 24 F1_weighted 0.3176487 0.9599148 5 1 tree
#> 25 <NA> NA NA NA NA <NA>
#> model_type
#> 1 <NA>
#> 2 regression
#> 3 regression
#> 4 regression
#> 5 regression
#> 6 regression
#> 7 <NA>
#> 8 regression
#> 9 regression
#> 10 regression
#> 11 regression
#> 12 regression
#> 13 <NA>
#> 14 regression
#> 15 regression
#> 16 regression
#> 17 regression
#> 18 regression
#> 19 <NA>
#> 20 regression
#> 21 classification
#> 22 classification
#> 23 classification
#> 24 classification
#> 25 <NA>
# pps type is target_by =====================================
##-----------------------------------------------------------
# If the target variable is a categorical variable
categ <- target_by(iris, Species)
# compute all variables
pps_cat <- pps(categ)
pps_cat
#> x y result_type pps metric
#> 1 Sepal.Length Species predictive power score 0.5591864 F1_weighted
#> 2 Sepal.Width Species predictive power score 0.3134401 F1_weighted
#> 3 Petal.Length Species predictive power score 0.9167580 F1_weighted
#> 4 Petal.Width Species predictive power score 0.9398532 F1_weighted
#> 5 Species Species predictor and target are the same 1.0000000 <NA>
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 0.3176487 0.7028029 5 1 tree classification
#> 2 0.3176487 0.5377587 5 1 tree classification
#> 3 0.3176487 0.9404972 5 1 tree classification
#> 4 0.3176487 0.9599148 5 1 tree classification
#> 5 NA NA NA NA <NA> <NA>
# compute Petal.Length and Petal.Width variable
pps_cat <- pps(categ, Petal.Length, Petal.Width)
pps_cat
#> x y result_type pps metric
#> 1 Petal.Length Species predictive power score 0.9167580 F1_weighted
#> 2 Petal.Width Species predictive power score 0.9398532 F1_weighted
#> 3 Species Species predictor and target are the same 1.0000000 <NA>
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 0.3176487 0.9404972 5 1 tree classification
#> 2 0.3176487 0.9599148 5 1 tree classification
#> 3 NA NA NA NA <NA> <NA>
# Using dplyr
pps_cat <- iris %>%
target_by(Species) %>%
pps()
pps_cat
#> x y result_type pps metric
#> 1 Sepal.Length Species predictive power score 0.5591864 F1_weighted
#> 2 Sepal.Width Species predictive power score 0.3134401 F1_weighted
#> 3 Petal.Length Species predictive power score 0.9167580 F1_weighted
#> 4 Petal.Width Species predictive power score 0.9398532 F1_weighted
#> 5 Species Species predictor and target are the same 1.0000000 <NA>
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 0.3176487 0.7028029 5 1 tree classification
#> 2 0.3176487 0.5377587 5 1 tree classification
#> 3 0.3176487 0.9404972 5 1 tree classification
#> 4 0.3176487 0.9599148 5 1 tree classification
#> 5 NA NA NA NA <NA> <NA>
##-----------------------------------------------------------
# If the target variable is a numerical variable
num <- target_by(iris, Petal.Length)
pps_num <- pps(num)
pps_num
#> x y result_type pps metric
#> 1 Sepal.Length Petal.Length predictive power score 0.6160836 MAE
#> 2 Sepal.Width Petal.Length predictive power score 0.2426385 MAE
#> 3 Petal.Length Petal.Length predictor and target are the same 1.0000000 <NA>
#> 4 Petal.Width Petal.Length predictive power score 0.7917512 MAE
#> 5 Species Petal.Length predictive power score 0.7904907 MAE
#> baseline_score model_score cv_folds seed algorithm model_type
#> 1 1.571967 0.5971445 5 1 tree regression
#> 2 1.571967 1.1945031 5 1 tree regression
#> 3 NA NA NA NA <NA> <NA>
#> 4 1.571967 0.3265152 5 1 tree regression
#> 5 1.571967 0.3280552 5 1 tree regression