The diagnose_report() report the information for diagnosing the quality of the DBMS table through tbl_dbi

# S3 method for tbl_dbi
diagnose_report(
  .data,
  output_format = c("pdf", "html"),
  output_file = NULL,
  output_dir = tempdir(),
  font_family = NULL,
  in_database = FALSE,
  collect_size = Inf,
  ...
)

Arguments

.data

a tbl_dbi.

output_format

report output type. Choose either "pdf" and "html". "pdf" create pdf file by knitr::knit(). "html" create html file by rmarkdown::render().

output_file

name of generated file. default is NULL.

output_dir

name of directory to generate report file. default is tempdir().

font_family

character. font family name for figure in pdf.

in_database

Specifies whether to perform in-database operations. If TRUE, most operations are performed in the DBMS. if FALSE, table data is taken in R and operated in-memory. Not yet supported in_database = TRUE.

collect_size

a integer. The number of data samples from the DBMS to R. Applies only if in_database = FALSE.

...

arguments to be passed to methods.

Details

Generate generalized data diagnostic reports automatically. You can choose to output to pdf and html files. This is useful for diagnosing a data frame with a large number of variables than data with a small number of variables. For pdf output, Korean Gothic font must be installed in Korean operating system.

Reported information

Reported from the data diagnosis is as follows.

  • Diagnose Data

    • Overview of Diagnosis

      • List of all variables quality

      • Diagnosis of missing data

      • Diagnosis of unique data(Text and Category)

      • Diagnosis of unique data(Numerical)

    • Detailed data diagnosis

      • Diagnosis of categorical variables

      • Diagnosis of numerical variables

      • List of numerical diagnosis (zero)

      • List of numerical diagnosis (minus)

  • Diagnose Outliers

    • Overview of Diagnosis

      • Diagnosis of numerical variable outliers

      • Detailed outliers diagnosis

See vignette("diagonosis") for an introduction to these concepts.

See also

Examples

# \donttest{ if (FALSE) { library(dplyr) # Generate data for the example heartfailure2 <- heartfailure heartfailure2[sample(seq(NROW(heartfailure2)), 20), "platelets"] <- NA heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA # connect DBMS con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") # copy heartfailure2 to the DBMS with a table named TB_HEARTFAILURE copy_to(con_sqlite, heartfailure2, name = "TB_HEARTFAILURE", overwrite = TRUE) # reporting the diagnosis information ------------------------- # create pdf file. file name is DataDiagnosis_Report.pdf con_sqlite %>% tbl("TB_HEARTFAILURE") %>% diagnose_report() # create pdf file. file name is Diagn.pdf, and collect size is 350 con_sqlite %>% tbl("TB_HEARTFAILURE") %>% diagnose_report(collect_size = 350, output_file = "Diagn.pdf") # create html file. file name is Diagnosis_Report.html con_sqlite %>% tbl("TB_HEARTFAILURE") %>% diagnose_report(output_format = "html") # create html file. file name is Diagn.html con_sqlite %>% tbl("TB_HEARTFAILURE") %>% diagnose_report(output_format = "html", output_file = "Diagn.html") # Disconnect DBMS DBI::dbDisconnect(con_sqlite) } # }