R/combination.R
diagnose_sparese.data.frame.Rd
The diagnose_sparese() checks for combinations of levels that do not appear as data among all combinations of levels of categorical variables.
diagnose_sparese(.data, ...) # S3 method for data.frame diagnose_sparese( .data, ..., type = c("all", "sparse")[2], add_character = FALSE, limit = 500 )
.data | a data.frame or a |
---|---|
... | one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, diagnose_sparese() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. |
type | a character string specifying how result are extracted. "all" that returns a combination of all possible levels. At this time, the frequency of each case is also returned.. Default is "sparse" returns only sparse level combinations. |
add_character | logical. Decide whether to include text variables in the diagnosis of categorical data. The default value is TRUE, which also includes character variables. |
limit | integer. Conditions to check sparse levels. If the number of all possible combinations exceeds the limit, the calculation ends. |
an object of data.frame.
The information derived from the sparse levels diagnosis is as follows.
variables : level of categorical variables.
N : number of observation. (optional)
#>#> NULL# Character type is also included in the combination variable diagnose_sparese(jobchange, add_character = TRUE)#>#> NULL# Combination of two variables jobchange %>% diagnose_sparese(education_level, major_discipline)#> # A tibble: 13 x 2 #> education_level major_discipline #> <fct> <fct> #> 1 Primary School Arts #> 2 High School Arts #> 3 Primary School Business Degree #> 4 High School Business Degree #> 5 Primary School Humanities #> 6 High School Humanities #> 7 Primary School No Major #> 8 High School No Major #> 9 Phd No Major #> 10 Primary School Other #> 11 High School Other #> 12 Primary School STEM #> 13 High School STEM# Remove two categorical variables from combination jobchange %>% diagnose_sparese(-city, -education_level)#>#> NULLdiagnose_sparese(heartfailure)#> # A tibble: 14 x 6 #> anaemia diabetes hblood_pressure sex smoking death_event #> <fct> <fct> <fct> <fct> <fct> <fct> #> 1 No No No Female Yes No #> 2 Yes No No Female Yes No #> 3 No Yes No Female Yes No #> 4 Yes Yes No Female Yes No #> 5 Yes No Yes Female Yes No #> 6 No Yes Yes Female Yes No #> 7 Yes Yes Yes Female Yes No #> 8 Yes Yes Yes Male No Yes #> 9 No No No Female Yes Yes #> 10 Yes No No Female Yes Yes #> 11 No Yes No Female Yes Yes #> 12 No No Yes Female Yes Yes #> 13 Yes Yes Yes Female Yes Yes #> 14 Yes Yes Yes Male Yes Yes# Adjust the threshold of limt to calculate diagnose_sparese(heartfailure, limit = 50)#>#> NULL# List all combinations, including parese cases diagnose_sparese(heartfailure, type = "all")#> anaemia diabetes hblood_pressure sex smoking death_event n_case #> 1 No No No Female No No 10 #> 2 Yes No No Female No No 11 #> 3 No Yes No Female No No 14 #> 4 Yes Yes No Female No No 9 #> 5 No No Yes Female No No 8 #> 6 Yes No Yes Female No No 6 #> 7 No Yes Yes Female No No 6 #> 8 Yes Yes Yes Female No No 6 #> 9 No No No Male No No 12 #> 10 Yes No No Male No No 7 #> 11 No Yes No Male No No 13 #> 12 Yes Yes No Male No No 11 #> 13 No No Yes Male No No 8 #> 14 Yes No Yes Male No No 8 #> 15 No Yes Yes Male No No 5 #> 16 Yes Yes Yes Male No No 3 #> 17 No No No Female Yes No 0 #> 18 Yes No No Female Yes No 0 #> 19 No Yes No Female Yes No 0 #> 20 Yes Yes No Female Yes No 0 #> 21 No No Yes Female Yes No 1 #> 22 Yes No Yes Female Yes No 0 #> 23 No Yes Yes Female Yes No 0 #> 24 Yes Yes Yes Female Yes No 0 #> 25 No No No Male Yes No 26 #> 26 Yes No No Male Yes No 12 #> 27 No Yes No Male Yes No 8 #> 28 Yes Yes No Male Yes No 4 #> 29 No No Yes Male Yes No 5 #> 30 Yes No Yes Male Yes No 4 #> 31 No Yes Yes Male Yes No 4 #> 32 Yes Yes Yes Male Yes No 2 #> 33 No No No Female No Yes 5 #> 34 Yes No No Female No Yes 2 #> 35 No Yes No Female No Yes 3 #> 36 Yes Yes No Female No Yes 6 #> 37 No No Yes Female No Yes 2 #> 38 Yes No Yes Female No Yes 4 #> 39 No Yes Yes Female No Yes 3 #> 40 Yes Yes Yes Female No Yes 6 #> 41 No No No Male No Yes 8 #> 42 Yes No No Male No Yes 10 #> 43 No Yes No Male No Yes 4 #> 44 Yes Yes No Male No Yes 3 #> 45 No No Yes Male No Yes 4 #> 46 Yes No Yes Male No Yes 3 #> 47 No Yes Yes Male No Yes 3 #> 48 Yes Yes Yes Male No Yes 0 #> 49 No No No Female Yes Yes 0 #> 50 Yes No No Female Yes Yes 0 #> 51 No Yes No Female Yes Yes 0 #> 52 Yes Yes No Female Yes Yes 1 #> 53 No No Yes Female Yes Yes 0 #> 54 Yes No Yes Female Yes Yes 1 #> 55 No Yes Yes Female Yes Yes 1 #> 56 Yes Yes Yes Female Yes Yes 0 #> 57 No No No Male Yes Yes 6 #> 58 Yes No No Male Yes Yes 3 #> 59 No Yes No Male Yes Yes 4 #> 60 Yes Yes No Male Yes Yes 2 #> 61 No No Yes Male Yes Yes 3 #> 62 Yes No Yes Male Yes Yes 5 #> 63 No Yes Yes Male Yes Yes 4 #> 64 Yes Yes Yes Male Yes Yes 0# collaboration with dplyr heartfailure %>% diagnose_sparese(type = "all") %>% arrange(desc(n_case)) %>% mutate(percent = round(n_case / sum(n_case) * 100, 1))#> anaemia diabetes hblood_pressure sex smoking death_event n_case percent #> 1 No No No Male Yes No 26 8.7 #> 2 No Yes No Female No No 14 4.7 #> 3 No Yes No Male No No 13 4.3 #> 4 No No No Male No No 12 4.0 #> 5 Yes No No Male Yes No 12 4.0 #> 6 Yes No No Female No No 11 3.7 #> 7 Yes Yes No Male No No 11 3.7 #> 8 No No No Female No No 10 3.3 #> 9 Yes No No Male No Yes 10 3.3 #> 10 Yes Yes No Female No No 9 3.0 #> 11 No No Yes Female No No 8 2.7 #> 12 No No Yes Male No No 8 2.7 #> 13 Yes No Yes Male No No 8 2.7 #> 14 No Yes No Male Yes No 8 2.7 #> 15 No No No Male No Yes 8 2.7 #> 16 Yes No No Male No No 7 2.3 #> 17 Yes No Yes Female No No 6 2.0 #> 18 No Yes Yes Female No No 6 2.0 #> 19 Yes Yes Yes Female No No 6 2.0 #> 20 Yes Yes No Female No Yes 6 2.0 #> 21 Yes Yes Yes Female No Yes 6 2.0 #> 22 No No No Male Yes Yes 6 2.0 #> 23 No Yes Yes Male No No 5 1.7 #> 24 No No Yes Male Yes No 5 1.7 #> 25 No No No Female No Yes 5 1.7 #> 26 Yes No Yes Male Yes Yes 5 1.7 #> 27 Yes Yes No Male Yes No 4 1.3 #> 28 Yes No Yes Male Yes No 4 1.3 #> 29 No Yes Yes Male Yes No 4 1.3 #> 30 Yes No Yes Female No Yes 4 1.3 #> 31 No Yes No Male No Yes 4 1.3 #> 32 No No Yes Male No Yes 4 1.3 #> 33 No Yes No Male Yes Yes 4 1.3 #> 34 No Yes Yes Male Yes Yes 4 1.3 #> 35 Yes Yes Yes Male No No 3 1.0 #> 36 No Yes No Female No Yes 3 1.0 #> 37 No Yes Yes Female No Yes 3 1.0 #> 38 Yes Yes No Male No Yes 3 1.0 #> 39 Yes No Yes Male No Yes 3 1.0 #> 40 No Yes Yes Male No Yes 3 1.0 #> 41 Yes No No Male Yes Yes 3 1.0 #> 42 No No Yes Male Yes Yes 3 1.0 #> 43 Yes Yes Yes Male Yes No 2 0.7 #> 44 Yes No No Female No Yes 2 0.7 #> 45 No No Yes Female No Yes 2 0.7 #> 46 Yes Yes No Male Yes Yes 2 0.7 #> 47 No No Yes Female Yes No 1 0.3 #> 48 Yes Yes No Female Yes Yes 1 0.3 #> 49 Yes No Yes Female Yes Yes 1 0.3 #> 50 No Yes Yes Female Yes Yes 1 0.3 #> 51 No No No Female Yes No 0 0.0 #> 52 Yes No No Female Yes No 0 0.0 #> 53 No Yes No Female Yes No 0 0.0 #> 54 Yes Yes No Female Yes No 0 0.0 #> 55 Yes No Yes Female Yes No 0 0.0 #> 56 No Yes Yes Female Yes No 0 0.0 #> 57 Yes Yes Yes Female Yes No 0 0.0 #> 58 Yes Yes Yes Male No Yes 0 0.0 #> 59 No No No Female Yes Yes 0 0.0 #> 60 Yes No No Female Yes Yes 0 0.0 #> 61 No Yes No Female Yes Yes 0 0.0 #> 62 No No Yes Female Yes Yes 0 0.0 #> 63 Yes Yes Yes Female Yes Yes 0 0.0 #> 64 Yes Yes Yes Male Yes Yes 0 0.0# }