Inquire basic information to understand the data in general.

overview(.data)

Arguments

.data

a data.frame or a tbl_df.

Value

An object of overview class. The overview class contains data.frame and two attributes. data.frame has the following 3 variables.: data.frame is as follow.:

  • division : division of information.

    • size : indicators of related to data capacity

    • duplicated : indicators of related to duplicated value

    • missing : indicators of related to missing value

    • data_type : indicators of related to data type

  • metrics : name of metrics.

    • observations : number of observations (number of rows)

    • variables : number of variables (number of columns)

    • values : number of values (number of cells. rows * columns)

    • memory size : an estimate of the memory that is being used to store an R object.

    • duplicate observation: number of duplicate cases(observations).

    • complete observation : number of complete cases(observations). i.e., have no missing values.

    • missing observation : number of observations that has missing values.

    • missing variables : number of variables that has missing values.

    • missing values : number of values(cells) that has missing values.

    • numerics : number of variables that is data type is numeric.

    • integers : number of variables that is data type is integer.

    • factors : number of variables that is data type is factor.

    • characters : number of variables that is data type is character.

    • Dates : number of variables that is data type is Date.

    • POSIXcts : number of variables that is data type is POSIXct.

    • others : number of variables that is not above.

  • value : value of metrics.

Attributes of overview class is as follows.:

  • duplicated : the index of duplicated observations.

  • na_col : the data type of predictor to replace missing value.

  • info_class : data.frame. variable name and class name that describe the data type of variables.

    • data.frame has a two variables.

      • variable : variable names

      • class : data type

Details

overview() creates an overview class. The `overview` class includes general information such as the size of the data, the degree of missing values, and the data types of variables.

Examples

# \donttest{
ov <- overview(jobchange)
ov
#>      division               metrics   value
#> 1        size          observations   19158
#> 2        size             variables      14
#> 3        size                values  268212
#> 4        size           memory size 2318464
#> 5  duplicated duplicate observation       0
#> 6     missing  complete observation    8955
#> 7     missing   missing observation   10203
#> 8     missing     missing variables       8
#> 9     missing        missing values   20733
#> 10  data type              numerics       1
#> 11  data type              integers       1
#> 12  data type       factors/ordered      11
#> 13  data type            characters       1
#> 14  data type                 Dates       0
#> 15  data type              POSIXcts       0
#> 16  data type                others       0

summary(ov)
#> ── Data Scale ────────────────────────────────────────────── 
#> • Number of observations            :     19,158
#> • Number of variables               :         14
#> • Number of values                  :    268,212
#> • Size of located memory(bytes)     :  2,318,464 
#> 
#> ── Duplicated Data ───────────────────────────────────────── 
#> • Number of duplicated observations :          0 (0%) 
#> 
#> ── Missing Data ──────────────────────────────────────────── 
#> • Number of completed observations  :      8,955
#> • Number of observations with NA    :     10,203 (53.26%)
#> • Number of variables with NA       :          8
#> • Number of NA                      :     20,733 
#> 
#> ── Data Type ─────────────────────────────────────────────── 
#> • Number of numeric variables       :          1
#> • Number of integer variables       :          1
#> • Number of factors variables       :         11
#> • Number of character variables     :          1
#> • Number of Date variables          :          0
#> • Number of POSIXct variables       :          0
#> • Number of other variables         :          0 
#> 
#> ── Individual variables ──────────────────────────────────── 
#>              Variables Data Type
#> 1          enrollee_id character
#> 2                 city    factor
#> 3       city_dev_index   numeric
#> 4               gender    factor
#> 5  relevent_experience    factor
#> 6  enrolled_university    factor
#> 7      education_level   ordered
#> 8     major_discipline    factor
#> 9           experience   ordered
#> 10        company_size   ordered
#> 11        company_type    factor
#> 12        last_new_job   ordered
#> 13      training_hours   integer
#> 14           job_chnge    factor

plot(ov)

# }