Pareto chart for missing value — plot_na

Visualize pareto chart for variables with missing value.

plot_na_pareto(
  x,
  only_na = FALSE,
  relative = FALSE,
  main = NULL,
  col = "black",
  grade = list(Good = 0.05, OK = 0.1, NotBad = 0.2, Bad = 0.5, Remove = 1),
  plot = TRUE,
  typographic = TRUE,
  base_family = NULL
)

Arguments

x: data frames, or objects to be coerced to one.
only_na: logical. The default value is FALSE. If TRUE, only variables containing missing values are selected for visualization. If FALSE, all variables are included.
relative: logical. If this argument is TRUE, it sets the unit of the left y-axis to relative frequency. In case of FALSE, set it to frequency.
main: character. Main title.
col: character. The color of line for display the cumulative percentage.
grade: list. Specifies the cut-off to set the grade of the variable according to the ratio of missing values. The default values are Good: [0, 0.05], OK: (0.05, 0.1], NotBad: (0.1, 0.2], Bad: (0.2, 0.5], Remove: (0.5, 1].
plot: logical. If this value is TRUE then visualize plot. else if FALSE, return aggregate information about missing values.
typographic: logical. Whether to apply focuses on typographic elements to ggplot2 visualization. The default is TRUE. if TRUE provides a base theme that focuses on typographic elements using hrbrthemes package.
base_family: character. The name of the base font family to use for the visualization. If not specified, the font defined in dlookr is applied. (See details)

Value

a ggplot2 object.

Details

The base_family is selected from "Roboto Condensed", "Liberation Sans Narrow", "NanumSquare", "Noto Sans Korean". If you want to use a different font, use it after loading the Google font with import_google_font().

Examples

# \donttest{
# Generate data for the example
set.seed(123L)
jobchange2 <- jobchange[sample(nrow(jobchange), size = 1000), ]

# Diagnose the data with missing_count using diagnose() function
library(dplyr)

jobchange2 %>% 
  diagnose %>% 
  arrange(desc(missing_count))
#> # A tibble: 14 × 6
#>    variables        types missing_count missing_percent unique_count unique_rate
#>    <chr>            <chr>         <int>           <dbl>        <int>       <dbl>
#>  1 company_type     fact…           325            32.5            7       0.007
#>  2 company_size     orde…           313            31.3            9       0.009
#>  3 gender           fact…           227            22.7            4       0.004
#>  4 major_discipline fact…           161            16.1            7       0.007
#>  5 last_new_job     orde…            24             2.4            7       0.007
#>  6 education_level  orde…            17             1.7            6       0.006
#>  7 enrolled_univer… fact…            11             1.1            4       0.004
#>  8 experience       orde…             3             0.3           23       0.023
#>  9 enrollee_id      char…             0             0           1000       1    
#> 10 city             fact…             0             0             88       0.088
#> 11 city_dev_index   nume…             0             0             72       0.072
#> 12 relevent_experi… fact…             0             0              2       0.002
#> 13 training_hours   inte…             0             0            194       0.194
#> 14 job_chnge        fact…             0             0              2       0.002

# Visualize pareto chart for variables with missing value.
plot_na_pareto(jobchange2)


# Visualize pareto chart for variables with missing value.
plot_na_pareto(jobchange2, col = "blue")


# Visualize only variables containing missing values
plot_na_pareto(jobchange2, only_na = TRUE)


# Display the relative frequency 
plot_na_pareto(jobchange2, relative = TRUE)


# Change the grade
plot_na_pareto(jobchange2, grade = list(High = 0.1, Middle = 0.6, Low = 1))


# Change the main title.
plot_na_pareto(jobchange2, relative = TRUE, only_na = TRUE, 
                 main = "Pareto Chart for jobchange")

  
# Return the aggregate information about missing values.
plot_na_pareto(jobchange2, only_na = TRUE, plot = FALSE)
#> # A tibble: 8 × 5
#>   variable            frequencies ratio grade  cumulative
#>   <fct>                     <int> <dbl> <fct>       <dbl>
#> 1 company_type                325 0.325 Bad          30.1
#> 2 company_size                313 0.313 Bad          59.0
#> 3 gender                      227 0.227 Bad          80.0
#> 4 major_discipline            161 0.161 NotBad       94.9
#> 5 last_new_job                 24 0.024 Good         97.1
#> 6 education_level              17 0.017 Good         98.7
#> 7 enrolled_university          11 0.011 Good         99.7
#> 8 experience                    3 0.003 Good        100  

# Non typographic elements
plot_na_pareto(jobchange2, typographic = FALSE)

# }