Find the numerical variable that skewed variable that inherits the data.frame or data.frame.
find_skewness(.data, index = TRUE, value = FALSE, thres = NULL)
a data.frame or a tbl_df
.
logical. When representing the information of a skewed variable, specify whether or not the variable is represented by an index. Returns an index if TRUE or a variable names if FALSE.
logical. If TRUE, returns the skewness value in the individual variable.
Returns a skewness threshold value that has an absolute skewness greater than thres. The default is NULL to ignore the threshold. but, If value = TRUE, default to 0.5.
Information on variables including skewness.
find_skewness(heartfailure)
#> [1] 3 5 7 8 9
find_skewness(heartfailure, index = FALSE)
#> [1] "cpk_enzyme" "ejection_fraction" "platelets"
#> [4] "creatinine" "sodium"
find_skewness(heartfailure, thres = 0.1)
#> [1] 3 5 7 8 9
find_skewness(heartfailure, value = TRUE)
#> cpk_enzyme ejection_fraction platelets creatinine
#> 4.441 0.553 1.455 4.434
#> sodium
#> -1.043
find_skewness(heartfailure, value = TRUE, thres = 0.1)
#> cpk_enzyme ejection_fraction platelets creatinine
#> 4.441 0.553 1.455 4.434
#> sodium
#> -1.043
## using dplyr -------------------------------------
library(dplyr)
# Perform simple data quality diagnosis of skewed variables
heartfailure %>%
select(find_skewness(.)) %>%
diagnose()
#> # A tibble: 5 × 6
#> variables types missing_count missing_percent unique_count unique_rate
#> <chr> <chr> <int> <dbl> <int> <dbl>
#> 1 cpk_enzyme nume… 0 0 208 0.696
#> 2 ejection_fraction nume… 0 0 17 0.0569
#> 3 platelets nume… 0 0 176 0.589
#> 4 creatinine nume… 0 0 40 0.134
#> 5 sodium nume… 0 0 27 0.0903