Inquire basic information to understand the data in general.
overview(.data)
.data | a data.frame or a |
---|
An object of overview class. The overview class contains data.frame and two attributes. data.frame has the following 3 variables.: data.frame is as follow.:
division : division of information.
size : indicators of related to data capacity
duplicated : indicators of related to duplicated value
missing : indicators of related to missing value
data_type : indicators of related to data type
metrics : name of metrics.
observations : number of observations (number of rows)
variables : number of variables (number of columns)
values : number of values (number of cells. rows * columns)
memory size : an estimate of the memory that is being used to store an R object.
duplicate observation: number of duplicate cases(observations).
complete observation : number of complete cases(observations). i.e., have no missing values.
missing observation : number of observations that has missing values.
missing variables : number of variables that has missing values.
missing values : number of values(cells) that has missing values.
numerics : number of variables that is data type is numeric.
integers : number of variables that is data type is integer.
factors : number of variables that is data type is factor.
characters : number of variables that is data type is character.
Dates : number of variables that is data type is Date.
POSIXcts : number of variables that is data type is POSIXct.
others : number of variables that is not above.
value : value of metrics.
Attributes of overview class is as follows.:
duplicated : the index of duplicated observations.
na_col : the data type of predictor to replace missing value.
info_class : data.frame. variable name and class name that describe the data type of variables.
data.frame has a two variables.
variable : variable names
class : data type
overview() creates an overview class. The `overview` class includes general information such as the size of the data, the degree of missing values, and the data types of variables.
# \donttest{ ov <- overview(jobchange) ov#> division metrics value #> 1 size observations 19158 #> 2 size variables 14 #> 3 size values 268212 #> 4 size memory size 2318464 #> 5 duplicated duplicate observation 0 #> 6 missing complete observation 8955 #> 7 missing missing observation 10203 #> 8 missing missing variables 8 #> 9 missing missing values 20733 #> 10 data type numerics 1 #> 11 data type integers 1 #> 12 data type factors/ordered 11 #> 13 data type characters 1 #> 14 data type Dates 0 #> 15 data type POSIXcts 0 #> 16 data type others 0summary(ov)#> ── Data Scale ────────────────────────────────────────────── #> • Number of observations : 19,158 #> • Number of variables : 14 #> • Number of values : 268,212 #> • Size of located memory(bytes) : 2,318,464 #> #> ── Duplicated Data ───────────────────────────────────────── #> • Number of duplicated observations : 0 (0%) #> #> ── Missing Data ──────────────────────────────────────────── #> • Number of completed observations : 8,955 #> • Number of observations with NA : 10,203 (53.26%) #> • Number of variables with NA : 8 #> • Number of NA : 20,733 #> #> ── Data Type ─────────────────────────────────────────────── #> • Number of numeric variables : 1 #> • Number of integer variables : 1 #> • Number of factors variables : 11 #> • Number of character variables : 1 #> • Number of Date variables : 0 #> • Number of POSIXct variables : 0 #> • Number of other variables : 0 #> #> ── Individual variables ──────────────────────────────────── #> Variables Data Type #> 1 enrollee_id character #> 2 city factor #> 3 city_dev_index numeric #> 4 gender factor #> 5 relevent_experience factor #> 6 enrolled_university factor #> 7 education_level ordered #> 8 major_discipline factor #> 9 experience ordered #> 10 company_size ordered #> 11 company_type factor #> 12 last_new_job ordered #> 13 training_hours integer #> 14 job_chnge factorplot(ov)# }