The eda_report() report the information of Exploratory data analysis for object inheriting from the DBMS table through tbl_dbi

# S3 method for tbl_dbi
eda_report(
  .data,
  target = NULL,
  output_format = c("pdf", "html"),
  output_file = NULL,
  font_family = NULL,
  output_dir = tempdir(),
  in_database = FALSE,
  collect_size = Inf,
  ...
)

Arguments

.data

a tbl_dbi.

target

target variable.

output_format

report output type. Choose either "pdf" and "html". "pdf" create pdf file by knitr::knit(). "html" create html file by rmarkdown::render().

output_file

name of generated file. default is NULL.

font_family

character. font family name for figure in pdf.

output_dir

name of directory to generate report file. default is tempdir().

in_database

Specifies whether to perform in-database operations. If TRUE, most operations are performed in the DBMS. if FALSE, table data is taken in R and operated in-memory. Not yet supported in_database = TRUE.

collect_size

a integer. The number of data samples from the DBMS to R. Applies only if in_database = FALSE.

...

arguments to be passed to methods.

Value

No return value. This function only generates a report.

Details

Generate generalized data EDA reports automatically. You can choose to output to pdf and html files. This is useful for EDA a data frame with a large number of variables than data with a small number of variables. For pdf output, Korean Gothic font must be installed in Korean operating system.

Reported information

The EDA process will report the following information:

  • Introduction

    • Information of Dataset

    • Information of Variables

    • About EDA Report

  • Univariate Analysis

    • Descriptive Statistics

    • Normality Test of Numerical Variables

      • Statistics and Visualization of (Sample) Data

  • Relationship Between Variables

    • Correlation Coefficient

      • Correlation Coefficient by Variable Combination

      • Correlation Plot of Numerical Variables

  • Target based Analysis

    • Grouped Descriptive Statistics

      • Grouped Numerical Variables

      • Grouped Categorical Variables

    • Grouped Relationship Between Variables

      • Grouped Correlation Coefficient

      • Grouped Correlation Plot of Numerical Variables

See vignette("EDA") for an introduction to these concepts.

Examples

# If you have the 'DBI' and 'RSQLite' packages installed, perform the code block:
if (FALSE) {
library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 20), "platelets"] <- NA
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA

# connect DBMS
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# copy heartfailure2 to the DBMS with a table named TB_HEARTFAILURE
copy_to(con_sqlite, heartfailure2, name = "TB_HEARTFAILURE", overwrite = TRUE)

## target variable is categorical variable
# reporting the EDA information
# create pdf file. file name is EDA_Report.pdf
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(death_event)

# create pdf file. file name is EDA_TB_HEARTFAILURE.pdf
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report("death_event", output_file = "EDA_TB_HEARTFAILURE.pdf")

# create html file. file name is EDA_Report.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report("death_event", output_format = "html")

# create html file. file name is EDA_TB_HEARTFAILURE.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(death_event, output_format = "html", output_file = "EDA_TB_HEARTFAILURE.html")

## target variable is numerical variable
# reporting the EDA information, and collect size is 250
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(sodium, collect_size = 250)

# create pdf file. file name is EDA2.pdf
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report("sodium", output_file = "EDA2.pdf")

# create html file. file name is EDA_Report.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report("sodium", output_format = "html")

# create html file. file name is EDA2.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(sodium, output_format = "html", output_file = "EDA2.html")

## target variable is null
# reporting the EDA information
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report()

# create pdf file. file name is EDA2.pdf
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(output_file = "EDA2.pdf")

# create html file. file name is EDA_Report.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(output_format = "html")

# create html file. file name is EDA2.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  eda_report(output_format = "html", output_file = "EDA2.html")
  
# Disconnect DBMS   
DBI::dbDisconnect(con_sqlite)
}