Visualize pareto chart for variables with missing value.
plot_na_pareto( x, only_na = FALSE, relative = FALSE, main = NULL, col = "black", grade = list(Good = 0.05, OK = 0.1, NotBad = 0.2, Bad = 0.5, Remove = 1), plot = TRUE, typographic = TRUE, base_family = NULL )
x | data frames, or objects to be coerced to one. |
---|---|
only_na | logical. The default value is FALSE. If TRUE, only variables containing missing values are selected for visualization. If FALSE, all variables are included. |
relative | logical. If this argument is TRUE, it sets the unit of the left y-axis to relative frequency. In case of FALSE, set it to frequency. |
main | character. Main title. |
col | character. The color of line for display the cumulative percentage. |
grade | list. Specifies the cut-off to set the grade of the variable according to the ratio of missing values. The default values are Good: [0, 0.05], OK: (0.05, 0.1], NotBad: (0.1, 0.2], Bad: (0.2, 0.5], Remove: (0.5, 1]. |
plot | logical. If this value is TRUE then visualize plot. else if FALSE, return aggregate information about missing values. |
typographic | logical. Whether to apply focuses on typographic elements to ggplot2 visualization. The default is TRUE. if TRUE provides a base theme that focuses on typographic elements using hrbrthemes package. |
base_family | character. The name of the base font family to use for the visualization. If not specified, the font defined in dlookr is applied. (See details) |
The base_family is selected from "Roboto Condensed", "Liberation Sans Narrow", "NanumSquare", "Noto Sans Korean". If you want to use a different font, use it after loading the Google font with import_google_font().
# \donttest{ # Generate data for the example set.seed(123L) jobchange2 <- jobchange[sample(nrow(jobchange), size = 1000), ] # Diagnose the data with missing_count using diagnose() function library(dplyr) jobchange2 %>% diagnose %>% arrange(desc(missing_count))#> # A tibble: 14 x 6 #> variables types missing_count missing_percent unique_count unique_rate #> <chr> <chr> <int> <dbl> <int> <dbl> #> 1 company_type factor 291 29.1 7 0.007 #> 2 company_size ordered 278 27.8 9 0.009 #> 3 gender factor 236 23.6 4 0.004 #> 4 major_discipl… factor 137 13.7 7 0.007 #> 5 education_lev… ordered 21 2.1 6 0.006 #> 6 last_new_job ordered 17 1.7 7 0.007 #> 7 enrolled_univ… factor 12 1.2 4 0.004 #> 8 experience ordered 4 0.4 23 0.023 #> 9 enrollee_id charac… 0 0 1000 1 #> 10 city factor 0 0 94 0.094 #> 11 city_dev_index numeric 0 0 73 0.073 #> 12 relevent_expe… factor 0 0 2 0.002 #> 13 training_hours integer 0 0 191 0.191 #> 14 job_chnge factor 0 0 2 0.002# Visualize pareto chart for variables with missing value. plot_na_pareto(jobchange2)# Visualize pareto chart for variables with missing value. plot_na_pareto(jobchange2, col = "blue")# Visualize only variables containing missing values plot_na_pareto(jobchange2, only_na = TRUE)# Display the relative frequency plot_na_pareto(jobchange2, relative = TRUE)# Change the main title. plot_na_pareto(jobchange2, relative = TRUE, only_na = TRUE, main = "Pareto Chart for jobchange")# Return the aggregate information about missing values. plot_na_pareto(jobchange2, only_na = TRUE, plot = FALSE)#> # A tibble: 8 x 5 #> variable frequencies ratio grade cumulative #> <fct> <int> <dbl> <fct> <dbl> #> 1 company_type 291 0.291 Bad 29.2 #> 2 company_size 278 0.278 Bad 57.1 #> 3 gender 236 0.236 Bad 80.8 #> 4 major_discipline 137 0.137 NotBad 94.6 #> 5 education_level 21 0.021 Good 96.7 #> 6 last_new_job 17 0.017 Good 98.4 #> 7 enrolled_university 12 0.012 Good 99.6 #> 8 experience 4 0.004 Good 100# Non typographic elements plot_na_pareto(jobchange2, typographic = FALSE)# }