The diagnose_report() report the information for diagnosing the quality of the DBMS table through tbl_dbi

# S3 method for tbl_dbi
diagnose_report(
  .data,
  output_format = c("pdf", "html"),
  output_file = NULL,
  output_dir = tempdir(),
  font_family = NULL,
  in_database = FALSE,
  collect_size = Inf,
  ...
)

Arguments

.data

a tbl_dbi.

output_format

report output type. Choose either "pdf" and "html". "pdf" create pdf file by knitr::knit(). "html" create html file by rmarkdown::render().

output_file

name of generated file. default is NULL.

output_dir

name of directory to generate report file. default is tempdir().

font_family

character. font family name for figure in pdf.

in_database

Specifies whether to perform in-database operations. If TRUE, most operations are performed in the DBMS. if FALSE, table data is taken in R and operated in-memory. Not yet supported in_database = TRUE.

collect_size

a integer. The number of data samples from the DBMS to R. Applies only if in_database = FALSE.

...

arguments to be passed to methods.

Value

No return value. This function only generates a report.

Details

Generate generalized data diagnostic reports automatically. You can choose to output to pdf and html files. This is useful for diagnosing a data frame with a large number of variables than data with a small number of variables. For pdf output, Korean Gothic font must be installed in Korean operating system.

Reported information

Reported from the data diagnosis is as follows.

  • Diagnose Data

    • Overview of Diagnosis

      • List of all variables quality

      • Diagnosis of missing data

      • Diagnosis of unique data(Text and Category)

      • Diagnosis of unique data(Numerical)

    • Detailed data diagnosis

      • Diagnosis of categorical variables

      • Diagnosis of numerical variables

      • List of numerical diagnosis (zero)

      • List of numerical diagnosis (minus)

  • Diagnose Outliers

    • Overview of Diagnosis

      • Diagnosis of numerical variable outliers

      • Detailed outliers diagnosis

See vignette("diagonosis") for an introduction to these concepts.

Examples

# If you have the 'DBI' and 'RSQLite' packages installed, perform the code block:
if (FALSE) {
library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 20), "platelets"] <- NA
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA

# connect DBMS
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# copy heartfailure2 to the DBMS with a table named TB_HEARTFAILURE
copy_to(con_sqlite, heartfailure2, name = "TB_HEARTFAILURE", overwrite = TRUE)

# reporting the diagnosis information -------------------------
# create pdf file. file name is DataDiagnosis_Report.pdf
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  diagnose_report()
  
# create pdf file. file name is Diagn.pdf, and collect size is 350
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  diagnose_report(collect_size = 350, output_file = "Diagn.pdf")

# create html file. file name is Diagnosis_Report.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  diagnose_report(output_format = "html")

# create html file. file name is Diagn.html
con_sqlite %>% 
  tbl("TB_HEARTFAILURE") %>% 
  diagnose_report(output_format = "html", output_file = "Diagn.html")
  
# Disconnect DBMS   
DBI::dbDisconnect(con_sqlite)
}