Find the variable that contains the missing value in the object that inherits the data.frame or data.frame.
find_na(.data, index = TRUE, rate = FALSE)
a data.frame or a tbl_df
.
logical. When representing the information of a variable including missing values, specify whether or not the variable is represented by an index. Returns an index if TRUE or a variable names if FALSE.
logical. If TRUE, returns the percentage of missing values in the individual variable.
Information on variables including missing values.
find_na(jobchange)
#> [1] 4 6 7 8 9 10 11 12
find_na(jobchange, index = FALSE)
#> [1] "gender" "enrolled_university" "education_level"
#> [4] "major_discipline" "experience" "company_size"
#> [7] "company_type" "last_new_job"
find_na(jobchange, rate = TRUE)
#> enrollee_id city city_dev_index gender
#> 0.000 0.000 0.000 23.531
#> relevent_experience enrolled_university education_level major_discipline
#> 0.000 2.015 2.401 14.683
#> experience company_size company_type last_new_job
#> 0.339 30.995 32.049 2.208
#> training_hours job_chnge
#> 0.000 0.000
## using dplyr -------------------------------------
library(dplyr)
# Perform simple data quality diagnosis of variables with missing values.
jobchange %>%
select(find_na(.)) %>%
diagnose()
#> # A tibble: 8 × 6
#> variables types missing_count missing_percent unique_count unique_rate
#> <chr> <chr> <int> <dbl> <int> <dbl>
#> 1 gender fact… 4508 23.5 4 0.000209
#> 2 enrolled_univers… fact… 386 2.01 4 0.000209
#> 3 education_level orde… 460 2.40 6 0.000313
#> 4 major_discipline fact… 2813 14.7 7 0.000365
#> 5 experience orde… 65 0.339 23 0.00120
#> 6 company_size orde… 5938 31.0 9 0.000470
#> 7 company_type fact… 6140 32.0 7 0.000365
#> 8 last_new_job orde… 423 2.21 7 0.000365