The diagnose_sparese() checks for combinations of levels that do not appear as data among all combinations of levels of categorical variables.

diagnose_sparese(.data, ...)

# S3 method for data.frame
diagnose_sparese(
  .data,
  ...,
  type = c("all", "sparse")[2],
  add_character = FALSE,
  limit = 500
)

Arguments

.data

a data.frame or a tbl_df.

...

one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, diagnose_sparese() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

type

a character string specifying how result are extracted. "all" that returns a combination of all possible levels. At this time, the frequency of each case is also returned.. Default is "sparse" returns only sparse level combinations.

add_character

logical. Decide whether to include text variables in the diagnosis of categorical data. The default value is TRUE, which also includes character variables.

limit

integer. Conditions to check sparse levels. If the number of all possible combinations exceeds the limit, the calculation ends.

Value

an object of data.frame.

Information of sparse levels

The information derived from the sparse levels diagnosis is as follows.

  • variables : level of categorical variables.

  • N : number of observation. (optional)

Examples

# \donttest{
library(dplyr)

# Examples of too many combinations
diagnose_sparese(jobchange)
#> All possible combinations of categorical variables exceed 500. (Number of combinations: 841,674,240)
#> NULL

# Character type is also included in the combination variable
diagnose_sparese(jobchange, add_character = TRUE)
#> All possible combinations of categorical variables exceed 500. (Number of combinations: 1.61248e+13)
#> NULL

# Combination of two variables
jobchange %>% 
  diagnose_sparese(education_level, major_discipline)
#>    education_level major_discipline
#> 1   Primary School             Arts
#> 2      High School             Arts
#> 3   Primary School  Business Degree
#> 4      High School  Business Degree
#> 5   Primary School       Humanities
#> 6      High School       Humanities
#> 7   Primary School         No Major
#> 8      High School         No Major
#> 9              Phd         No Major
#> 10  Primary School            Other
#> 11     High School            Other
#> 12  Primary School             STEM
#> 13     High School             STEM

# Remove two categorical variables from combination
jobchange %>% 
  diagnose_sparese(-city, -education_level)
#> All possible combinations of categorical variables exceed 500. (Number of combinations: 1,368,576)
#> NULL

diagnose_sparese(heartfailure)
#>    anaemia diabetes hblood_pressure    sex smoking death_event
#> 1       No       No              No Female     Yes          No
#> 2      Yes       No              No Female     Yes          No
#> 3       No      Yes              No Female     Yes          No
#> 4      Yes      Yes              No Female     Yes          No
#> 5      Yes       No             Yes Female     Yes          No
#> 6       No      Yes             Yes Female     Yes          No
#> 7      Yes      Yes             Yes Female     Yes          No
#> 8      Yes      Yes             Yes   Male      No         Yes
#> 9       No       No              No Female     Yes         Yes
#> 10     Yes       No              No Female     Yes         Yes
#> 11      No      Yes              No Female     Yes         Yes
#> 12      No       No             Yes Female     Yes         Yes
#> 13     Yes      Yes             Yes Female     Yes         Yes
#> 14     Yes      Yes             Yes   Male     Yes         Yes

# Adjust the threshold of limt to calculate
diagnose_sparese(heartfailure, limit = 50)
#> All possible combinations of categorical variables exceed 50. (Number of combinations: 64)
#> NULL

# List all combinations, including parese cases
diagnose_sparese(heartfailure, type = "all") 
#>    anaemia diabetes hblood_pressure    sex smoking death_event n_case
#> 1       No       No              No Female      No          No     10
#> 2      Yes       No              No Female      No          No     11
#> 3       No      Yes              No Female      No          No     14
#> 4      Yes      Yes              No Female      No          No      9
#> 5       No       No             Yes Female      No          No      8
#> 6      Yes       No             Yes Female      No          No      6
#> 7       No      Yes             Yes Female      No          No      6
#> 8      Yes      Yes             Yes Female      No          No      6
#> 9       No       No              No   Male      No          No     12
#> 10     Yes       No              No   Male      No          No      7
#> 11      No      Yes              No   Male      No          No     13
#> 12     Yes      Yes              No   Male      No          No     11
#> 13      No       No             Yes   Male      No          No      8
#> 14     Yes       No             Yes   Male      No          No      8
#> 15      No      Yes             Yes   Male      No          No      5
#> 16     Yes      Yes             Yes   Male      No          No      3
#> 17      No       No              No Female     Yes          No      0
#> 18     Yes       No              No Female     Yes          No      0
#> 19      No      Yes              No Female     Yes          No      0
#> 20     Yes      Yes              No Female     Yes          No      0
#> 21      No       No             Yes Female     Yes          No      1
#> 22     Yes       No             Yes Female     Yes          No      0
#> 23      No      Yes             Yes Female     Yes          No      0
#> 24     Yes      Yes             Yes Female     Yes          No      0
#> 25      No       No              No   Male     Yes          No     26
#> 26     Yes       No              No   Male     Yes          No     12
#> 27      No      Yes              No   Male     Yes          No      8
#> 28     Yes      Yes              No   Male     Yes          No      4
#> 29      No       No             Yes   Male     Yes          No      5
#> 30     Yes       No             Yes   Male     Yes          No      4
#> 31      No      Yes             Yes   Male     Yes          No      4
#> 32     Yes      Yes             Yes   Male     Yes          No      2
#> 33      No       No              No Female      No         Yes      5
#> 34     Yes       No              No Female      No         Yes      2
#> 35      No      Yes              No Female      No         Yes      3
#> 36     Yes      Yes              No Female      No         Yes      6
#> 37      No       No             Yes Female      No         Yes      2
#> 38     Yes       No             Yes Female      No         Yes      4
#> 39      No      Yes             Yes Female      No         Yes      3
#> 40     Yes      Yes             Yes Female      No         Yes      6
#> 41      No       No              No   Male      No         Yes      8
#> 42     Yes       No              No   Male      No         Yes     10
#> 43      No      Yes              No   Male      No         Yes      4
#> 44     Yes      Yes              No   Male      No         Yes      3
#> 45      No       No             Yes   Male      No         Yes      4
#> 46     Yes       No             Yes   Male      No         Yes      3
#> 47      No      Yes             Yes   Male      No         Yes      3
#> 48     Yes      Yes             Yes   Male      No         Yes      0
#> 49      No       No              No Female     Yes         Yes      0
#> 50     Yes       No              No Female     Yes         Yes      0
#> 51      No      Yes              No Female     Yes         Yes      0
#> 52     Yes      Yes              No Female     Yes         Yes      1
#> 53      No       No             Yes Female     Yes         Yes      0
#> 54     Yes       No             Yes Female     Yes         Yes      1
#> 55      No      Yes             Yes Female     Yes         Yes      1
#> 56     Yes      Yes             Yes Female     Yes         Yes      0
#> 57      No       No              No   Male     Yes         Yes      6
#> 58     Yes       No              No   Male     Yes         Yes      3
#> 59      No      Yes              No   Male     Yes         Yes      4
#> 60     Yes      Yes              No   Male     Yes         Yes      2
#> 61      No       No             Yes   Male     Yes         Yes      3
#> 62     Yes       No             Yes   Male     Yes         Yes      5
#> 63      No      Yes             Yes   Male     Yes         Yes      4
#> 64     Yes      Yes             Yes   Male     Yes         Yes      0

# collaboration with dplyr
heartfailure %>% 
  diagnose_sparese(type = "all") %>% 
  arrange(desc(n_case)) %>% 
  mutate(percent = round(n_case / sum(n_case) * 100, 1))
#>    anaemia diabetes hblood_pressure    sex smoking death_event n_case percent
#> 1       No       No              No   Male     Yes          No     26     8.7
#> 2       No      Yes              No Female      No          No     14     4.7
#> 3       No      Yes              No   Male      No          No     13     4.3
#> 4       No       No              No   Male      No          No     12     4.0
#> 5      Yes       No              No   Male     Yes          No     12     4.0
#> 6      Yes       No              No Female      No          No     11     3.7
#> 7      Yes      Yes              No   Male      No          No     11     3.7
#> 8       No       No              No Female      No          No     10     3.3
#> 9      Yes       No              No   Male      No         Yes     10     3.3
#> 10     Yes      Yes              No Female      No          No      9     3.0
#> 11      No       No             Yes Female      No          No      8     2.7
#> 12      No       No             Yes   Male      No          No      8     2.7
#> 13     Yes       No             Yes   Male      No          No      8     2.7
#> 14      No      Yes              No   Male     Yes          No      8     2.7
#> 15      No       No              No   Male      No         Yes      8     2.7
#> 16     Yes       No              No   Male      No          No      7     2.3
#> 17     Yes       No             Yes Female      No          No      6     2.0
#> 18      No      Yes             Yes Female      No          No      6     2.0
#> 19     Yes      Yes             Yes Female      No          No      6     2.0
#> 20     Yes      Yes              No Female      No         Yes      6     2.0
#> 21     Yes      Yes             Yes Female      No         Yes      6     2.0
#> 22      No       No              No   Male     Yes         Yes      6     2.0
#> 23      No      Yes             Yes   Male      No          No      5     1.7
#> 24      No       No             Yes   Male     Yes          No      5     1.7
#> 25      No       No              No Female      No         Yes      5     1.7
#> 26     Yes       No             Yes   Male     Yes         Yes      5     1.7
#> 27     Yes      Yes              No   Male     Yes          No      4     1.3
#> 28     Yes       No             Yes   Male     Yes          No      4     1.3
#> 29      No      Yes             Yes   Male     Yes          No      4     1.3
#> 30     Yes       No             Yes Female      No         Yes      4     1.3
#> 31      No      Yes              No   Male      No         Yes      4     1.3
#> 32      No       No             Yes   Male      No         Yes      4     1.3
#> 33      No      Yes              No   Male     Yes         Yes      4     1.3
#> 34      No      Yes             Yes   Male     Yes         Yes      4     1.3
#> 35     Yes      Yes             Yes   Male      No          No      3     1.0
#> 36      No      Yes              No Female      No         Yes      3     1.0
#> 37      No      Yes             Yes Female      No         Yes      3     1.0
#> 38     Yes      Yes              No   Male      No         Yes      3     1.0
#> 39     Yes       No             Yes   Male      No         Yes      3     1.0
#> 40      No      Yes             Yes   Male      No         Yes      3     1.0
#> 41     Yes       No              No   Male     Yes         Yes      3     1.0
#> 42      No       No             Yes   Male     Yes         Yes      3     1.0
#> 43     Yes      Yes             Yes   Male     Yes          No      2     0.7
#> 44     Yes       No              No Female      No         Yes      2     0.7
#> 45      No       No             Yes Female      No         Yes      2     0.7
#> 46     Yes      Yes              No   Male     Yes         Yes      2     0.7
#> 47      No       No             Yes Female     Yes          No      1     0.3
#> 48     Yes      Yes              No Female     Yes         Yes      1     0.3
#> 49     Yes       No             Yes Female     Yes         Yes      1     0.3
#> 50      No      Yes             Yes Female     Yes         Yes      1     0.3
#> 51      No       No              No Female     Yes          No      0     0.0
#> 52     Yes       No              No Female     Yes          No      0     0.0
#> 53      No      Yes              No Female     Yes          No      0     0.0
#> 54     Yes      Yes              No Female     Yes          No      0     0.0
#> 55     Yes       No             Yes Female     Yes          No      0     0.0
#> 56      No      Yes             Yes Female     Yes          No      0     0.0
#> 57     Yes      Yes             Yes Female     Yes          No      0     0.0
#> 58     Yes      Yes             Yes   Male      No         Yes      0     0.0
#> 59      No       No              No Female     Yes         Yes      0     0.0
#> 60     Yes       No              No Female     Yes         Yes      0     0.0
#> 61      No      Yes              No Female     Yes         Yes      0     0.0
#> 62      No       No             Yes Female     Yes         Yes      0     0.0
#> 63     Yes      Yes             Yes Female     Yes         Yes      0     0.0
#> 64     Yes      Yes             Yes   Male     Yes         Yes      0     0.0
# }