Inquire basic information to understand the data in general.

overview(.data)

Arguments

.data

a data.frame or a tbl_df.

Value

An object of overview class. The overview class contains data.frame and two attributes. data.frame has the following 3 variables.: data.frame is as follow.:

  • division : division of information.

    • size : indicators of related to data capacity

    • duplicated : indicators of related to duplicated value

    • missing : indicators of related to missing value

    • data_type : indicators of related to data type

  • metrics : name of metrics.

    • observations : number of observations (number of rows)

    • variables : number of variables (number of columns)

    • values : number of values (number of cells. rows * columns)

    • memory size : an estimate of the memory that is being used to store an R object.

    • duplicate observation: number of duplicate cases(observations).

    • complete observation : number of complete cases(observations). i.e., have no missing values.

    • missing observation : number of observations that has missing values.

    • missing variables : number of variables that has missing values.

    • missing values : number of values(cells) that has missing values.

    • numerics : number of variables that is data type is numeric.

    • integers : number of variables that is data type is integer.

    • factors : number of variables that is data type is factor.

    • characters : number of variables that is data type is character.

    • Dates : number of variables that is data type is Date.

    • POSIXcts : number of variables that is data type is POSIXct.

    • others : number of variables that is not above.

  • value : value of metrics.

Attributes of overview class is as follows.:

  • duplicated : the index of duplicated observations.

  • na_col : the data type of predictor to replace missing value.

  • info_class : data.frame. variable name and class name that describe the data type of variables.

    • data.frame has a two variables.

      • variable : variable names

      • class : data type

Details

overview() creates an overview class. The `overview` class includes general information such as the size of the data, the degree of missing values, and the data types of variables.

See also

Examples

# \donttest{ ov <- overview(jobchange) ov
#> division metrics value #> 1 size observations 19158 #> 2 size variables 14 #> 3 size values 268212 #> 4 size memory size 2318464 #> 5 duplicated duplicate observation 0 #> 6 missing complete observation 8955 #> 7 missing missing observation 10203 #> 8 missing missing variables 8 #> 9 missing missing values 20733 #> 10 data type numerics 1 #> 11 data type integers 1 #> 12 data type factors/ordered 11 #> 13 data type characters 1 #> 14 data type Dates 0 #> 15 data type POSIXcts 0 #> 16 data type others 0
summary(ov)
#> ── Data Scale ────────────────────────────────────────────── #> • Number of observations : 19,158 #> • Number of variables : 14 #> • Number of values : 268,212 #> • Size of located memory(bytes) : 2,318,464 #> #> ── Duplicated Data ───────────────────────────────────────── #> • Number of duplicated observations : 0 (0%) #> #> ── Missing Data ──────────────────────────────────────────── #> • Number of completed observations : 8,955 #> • Number of observations with NA : 10,203 (53.26%) #> • Number of variables with NA : 8 #> • Number of NA : 20,733 #> #> ── Data Type ─────────────────────────────────────────────── #> • Number of numeric variables : 1 #> • Number of integer variables : 1 #> • Number of factors variables : 11 #> • Number of character variables : 1 #> • Number of Date variables : 0 #> • Number of POSIXct variables : 0 #> • Number of other variables : 0 #> #> ── Individual variables ──────────────────────────────────── #> Variables Data Type #> 1 enrollee_id character #> 2 city factor #> 3 city_dev_index numeric #> 4 gender factor #> 5 relevent_experience factor #> 6 enrolled_university factor #> 7 education_level ordered #> 8 major_discipline factor #> 9 experience ordered #> 10 company_size ordered #> 11 company_type factor #> 12 last_new_job ordered #> 13 training_hours integer #> 14 job_chnge factor
plot(ov)
# }