Inquire basic information to understand the data in general.
overview(.data)
a data.frame or a tbl_df
.
An object of overview class. The overview class contains data.frame and two attributes. data.frame has the following 3 variables.: data.frame is as follow.:
division : division of information.
size : indicators of related to data capacity
duplicated : indicators of related to duplicated value
missing : indicators of related to missing value
data_type : indicators of related to data type
metrics : name of metrics.
observations : number of observations (number of rows)
variables : number of variables (number of columns)
values : number of values (number of cells. rows * columns)
memory size : an estimate of the memory that is being used to store an R object.
duplicate observation: number of duplicate cases(observations).
complete observation : number of complete cases(observations). i.e., have no missing values.
missing observation : number of observations that has missing values.
missing variables : number of variables that has missing values.
missing values : number of values(cells) that has missing values.
numerics : number of variables that is data type is numeric.
integers : number of variables that is data type is integer.
factors : number of variables that is data type is factor.
characters : number of variables that is data type is character.
Dates : number of variables that is data type is Date.
POSIXcts : number of variables that is data type is POSIXct.
others : number of variables that is not above.
value : value of metrics.
Attributes of overview class is as follows.:
duplicated : the index of duplicated observations.
na_col : the data type of predictor to replace missing value.
info_class : data.frame. variable name and class name that describe the data type of variables.
data.frame has a two variables.
variable : variable names
class : data type
overview() creates an overview class. The `overview` class includes general information such as the size of the data, the degree of missing values, and the data types of variables.
# \donttest{
ov <- overview(jobchange)
ov
#> division metrics value
#> 1 size observations 19158
#> 2 size variables 14
#> 3 size values 268212
#> 4 size memory size 2318464
#> 5 duplicated duplicate observation 0
#> 6 missing complete observation 8955
#> 7 missing missing observation 10203
#> 8 missing missing variables 8
#> 9 missing missing values 20733
#> 10 data type numerics 1
#> 11 data type integers 1
#> 12 data type factors/ordered 11
#> 13 data type characters 1
#> 14 data type Dates 0
#> 15 data type POSIXcts 0
#> 16 data type others 0
summary(ov)
#> ── Data Scale ──────────────────────────────────────────────
#> • Number of observations : 19,158
#> • Number of variables : 14
#> • Number of values : 268,212
#> • Size of located memory(bytes) : 2,318,464
#>
#> ── Duplicated Data ─────────────────────────────────────────
#> • Number of duplicated observations : 0 (0%)
#>
#> ── Missing Data ────────────────────────────────────────────
#> • Number of completed observations : 8,955
#> • Number of observations with NA : 10,203 (53.26%)
#> • Number of variables with NA : 8
#> • Number of NA : 20,733
#>
#> ── Data Type ───────────────────────────────────────────────
#> • Number of numeric variables : 1
#> • Number of integer variables : 1
#> • Number of factors variables : 11
#> • Number of character variables : 1
#> • Number of Date variables : 0
#> • Number of POSIXct variables : 0
#> • Number of other variables : 0
#>
#> ── Individual variables ────────────────────────────────────
#> Variables Data Type
#> 1 enrollee_id character
#> 2 city factor
#> 3 city_dev_index numeric
#> 4 gender factor
#> 5 relevent_experience factor
#> 6 enrolled_university factor
#> 7 education_level ordered
#> 8 major_discipline factor
#> 9 experience ordered
#> 10 company_size ordered
#> 11 company_type factor
#> 12 last_new_job ordered
#> 13 training_hours integer
#> 14 job_chnge factor
plot(ov)
# }