Tools for Data Diagnosis, Exploration, Transformation

dlookr provides data diagnosis, data exploration and transformation of variables during data analysis.

dlookr dlookr-package

dlookr: Tools for Data Diagnosis, Exploration, Transformation

Data quality diagnosis

overview()

Describe overview of data

plot(<overview>)

Visualize Information for an "overview" Object

summary(<overview>)

Summarizing overview information

diagnose()

Diagnose data quality of variables

diagnose(<tbl_dbi>)

Diagnose data quality of variables in the DBMS

diagnose_category()

Diagnose data quality of categorical variables

diagnose_category(<tbl_dbi>)

Diagnose data quality of categorical variables in the DBMS

diagnose_numeric()

Diagnose data quality of numerical variables

diagnose_numeric(<tbl_dbi>)

Diagnose data quality of numerical variables in the DBMS

diagnose_outlier()

Diagnose outlier of numerical variables

diagnose_outlier(<tbl_dbi>)

Diagnose outlier of numerical variables in the DBMS

diagnose_paged_report()

Reporting the information of data diagnosis

diagnose_paged_report(<tbl_dbi>)

Reporting the information of data diagnosis for table of the DBMS

diagnose_report()

Reporting the information of data diagnosis

diagnose_report(<tbl_dbi>)

Reporting the information of data diagnosis for table of the DBMS

diagnose_sparese()

Diagnosis of level combinations of categorical variables

diagnose_web_report()

Reporting the information of data diagnosis with html

diagnose_web_report(<tbl_dbi>)

Reporting the information of data diagnosis for table of the DBMS with html

plot_na_hclust()

Combination chart for missing value

plot_na_intersect()

Plot the combination variables that is include missing value

plot_na_pareto()

Pareto chart for missing value

plot_outlier()

Plot outlier information of numerical data diagnosis

plot_outlier(<target_df>)

Plot outlier information of target_df

plot_outlier(<tbl_dbi>)

Plot outlier information of numerical data diagnosis in the DBMS

Exploratory Data Analysis

describe()

Compute descriptive statistic

describe(<tbl_dbi>)

Compute descriptive statistic

normality()

Performs the Shapiro-Wilk test of normality

normality(<tbl_dbi>)

Performs the Shapiro-Wilk test of normality

plot_bar_category()

Plot bar chart of categorical variables

plot_qq_numeric()

Plot Q-Q plot of numerical variables

plot_box_numeric()

Plot Box-Plot of numerical variables

plot_hist_numeric()

Plot histogram of numerical variables

plot_normality()

Plot distribution information of numerical data

plot_normality(<tbl_dbi>)

Plot distribution information of numerical data

correlate()

Compute the correlation coefficient between two variable

plot_correlate()

Deprecated functions in package ‘dlookr’

plot_correlate(<tbl_dbi>)

Visualize correlation plot of numerical data

plot(<correlate>)

Visualize Information for an "correlate" Object

summary(<correlate>)

Summarizing Correlation Coefficient

target_by()

Target by one variables

target_by(<tbl_dbi>)

Target by one column in the DBMS

relate()

Relationship between target variable and variable of interest

print(<relate>)

Summarizing relate information

plot(<relate>)

Visualize Information for an "relate" Object

compare_category()

Compare categorical variables

compare_numeric()

Compare numerical variables

summary(<compare_category>) print(<compare_category>)

Summarizing compare_category information

summary(<compare_numeric>) print(<compare_numeric>)

Summarizing compare_numeric information

plot(<compare_category>)

Visualize Information for an "compare_category" Object

plot(<compare_numeric>)

Visualize Information for an "compare_numeric" Object

univar_category()

Statistic of univariate categorical variables

univar_numeric()

Statistic of univariate numerical variables

summary(<univar_category>) print(<univar_category>)

Summarizing univar_category information

summary(<univar_numeric>) print(<univar_numeric>)

Summarizing univar_numeric information

plot(<univar_category>)

Visualize Information for an "univar_category" Object

plot(<univar_numeric>)

Visualize Information for an "univar_numeric" Object

eda_report()

Reporting the information of EDA

eda_report(<tbl_dbi>)

Reporting the information of EDA for table of the DBMS

eda_web_report()

Reporting the information of EDA with html

eda_web_report(<tbl_dbi>)

Reporting the information of EDA for table of the DBMS with html

eda_paged_report()

Reporting the information of EDA

eda_paged_report(<tbl_dbi>)

Reporting the information of EDA for table of the DBMS

pps()

Compute Predictive Power Score

summary(<pps>)

Summarizing Predictive Power Score

plot(<pps>)

Visualize Information for an "pps" Object

Data Transformation

find_na()

Finding variables including missing values

find_outliers()

Finding variables including outliers

find_skewness()

Finding skewed variables

imputate_na()

Impute Missing Values

imputate_outlier()

Impute Outliers

summary(<imputation>)

Summarizing imputation information

plot(<imputation>)

Visualize Information for an "imputation" Object

transform()

Data Transformations

summary(<transform>) print(<transform>)

Summarizing transformation information

plot(<transform>)

Visualize Information for an "transform" Object

binning()

Binning the Numeric Data

binning_by()

Optimal Binning for Scoring Modeling

binning_rgr()

Binning by recursive information gain ratio maximization

summary(<bins>) print(<bins>)

Summarizing Binned Variable

plot(<bins>)

Visualize Distribution for a "bins" object

plot(<optimal_bins>)

Visualize Distribution for an "optimal_bins" Object

summary(<optimal_bins>)

Summarizing Performance for Optimal Bins

plot(<infogain_bins>)

Visualize Distribution for an "infogain_bins" Object

extract()

Extract bins from "bins"

performance_bin()

Diagnose Performance Binned Variable

summary(<performance_bin>)

Summarizing Performance for Binned Variable

plot(<performance_bin>)

Visualize Performance for an "performance_bin" Object

transformation_report()

Reporting the information of transformation

transformation_paged_report()

Reporting the information of transformation

transformation_web_report()

Reporting the information of transformation with html

Miscellaneous

entropy()

Calculate the entropy

skewness()

Skewness of the data

kurtosis()

Kurtosis of the data

kld()

Kullback-Leibler Divergence

jsd()

Jensen-Shannon Divergence

cramer()

Cramer's V statistic

theil()

Theil's U statistic

find_class()

Extract variable names or indices of a specific class

get_class()

Extracting a class of variables

get_column_info()

Describe column of table in the DBMS

get_os()

Finding Users Machine's OS

get_percentile()

Finding percentile

get_transform()

Transform a numeric vector

import_google_font()

Import Google Fonts

dlookr_orange_paged() dlookr_blue_paged()

Generate paged HTML document

dlookr_templ_html()

dlookr HTML template

Datas

Carseats

Sales of Child Car Seats

flights

Flights data

heartfailure

Heart Failure Data

jobchange

Job Change of Data Scientists