Import data with readr

Tidyverse readr import data

readr 패키지로 데이터를 R로 가져오는 방법을 숙지합니다.

유충현 https://choonghyunryu.github.io (한국알사용자회)
2022-02-23

들어가기

대용량 데이터는 이제부터 readr을 사용하자. 왜? 속도가 빠르기 때문에 필요없는 대기 시간을 줄여주기 때문입니다.
readr은 파일의 인코딩을 유추할 수 있는 기능이 있어 특히 한글 파일을 import할 때 유용합니다.


준비하기

데이터 파일 준비하기

다음과 같은 데이터 파일을 다운로드 한다. 브라우저의 다른 이름으로 링크 저장 기능을 이용해서 파일을 다운로드 합니다.

“data” 라는 디렉토리를 생성하고, 다운로드한 데이터 파일을 복사해 넣습는다.

실습하기

CSV 파일 읽기

기존 방법

read.csv() 함수는 CSV 파일을 읽을 수 있는 함수입니다.

apt <- read.csv(file = "data/아파트매매_실거래가_200001.csv", 
                header = TRUE, 
                fileEncoding = "cp949",
                stringsAsFactors = FALSE)

str(apt)
'data.frame':   58805 obs. of  13 variables:
 $ 시군구        : chr  "강원도 강릉시 견소동" "강원도 강릉시 견소동" "강원도 강릉시 견소동" "강원도 강릉시 견소동" ...
 $ 번지          : chr  "202" "202" "202" "202" ...
 $ 본번          : chr  "0202" "0202" "0202" "0202" ...
 $ 부번          : chr  "0000" "0000" "0000" "0000" ...
 $ 단지명        : chr  "송정한신" "송정한신" "송정한신" "송정한신" ...
 $ 전용면적...   : num  43.4 59.8 59.8 84.9 116.2 ...
 $ 계약년월      : int  202001 202001 202001 202001 202001 202001 202001 202001 202001 202001 ...
 $ 계약일        : int  3 15 18 18 21 23 6 20 27 4 ...
 $ 거래금액.만원.: chr  "12,000" "10,000" "10,500" "13,500" ...
 $ 층            : int  12 3 12 11 5 8 10 15 8 14 ...
 $ 건축년도      : int  1997 1997 1997 1997 1997 1997 2005 2009 2009 1999 ...
 $ 도로명        : chr  "경강로2539번길 8" "경강로2539번길 8" "경강로2539번길 8" "경강로2539번길 8" ...
 $ 해제사유발생일: logi  NA NA NA NA NA NA ...
dim(apt)
[1] 58805    13

readr

read_csv() 함수는 readr 패키지에서 CSV 파일을 읽을 수 있는 함수입니다.

그런데 read_csv() 함수로 영문 파일이 아닌 멀티 바이트 문자 파일을 읽을 때 에러가 발생 할 수 있습니다. 이 경우는 guess_encoding() 함수로 파일의 인코딩을 유추할 수 있습니다. 그러나 read.csv()에서의 사용하는 인코딩 이름과 체계가 다를 수 있습니다. “cp949”가 “EUC-KR”로 유추됩니다. read_csv() 함수에 “EUC-KR” 인코딩 이름을 적용해도 결과는 동일합니다.

library(readr)
library(dplyr)

apt2 <- read_csv(file = "data/아파트매매_실거래가_200001.csv",
                 locale = locale(encoding = "cp949"))

str(apt2)
spec_tbl_df [58,805 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ 시군구        : chr [1:58805] "강원도 강릉시 견소동" "강원도 강릉시 견소동" "강원도 강릉시 견소동" "강원도 강릉시 견소동" ...
 $ 번지          : chr [1:58805] "202" "202" "202" "202" ...
 $ 본번          : chr [1:58805] "0202" "0202" "0202" "0202" ...
 $ 부번          : chr [1:58805] "0000" "0000" "0000" "0000" ...
 $ 단지명        : chr [1:58805] "송정한신" "송정한신" "송정한신" "송정한신" ...
 $ 전용면적(㎡)  : num [1:58805] 43.4 59.8 59.8 84.9 116.2 ...
 $ 계약년월      : num [1:58805] 202001 202001 202001 202001 202001 ...
 $ 계약일        : num [1:58805] 3 15 18 18 21 23 6 20 27 4 ...
 $ 거래금액(만원): num [1:58805] 12000 10000 10500 13500 19000 ...
 $ 층            : num [1:58805] 12 3 12 11 5 8 10 15 8 14 ...
 $ 건축년도      : num [1:58805] 1997 1997 1997 1997 1997 ...
 $ 도로명        : chr [1:58805] "경강로2539번길 8" "경강로2539번길 8" "경강로2539번길 8" "경강로2539번길 8" ...
 $ 해제사유발생일: logi [1:58805] NA NA NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   시군구 = col_character(),
  ..   번지 = col_character(),
  ..   본번 = col_character(),
  ..   부번 = col_character(),
  ..   단지명 = col_character(),
  ..   `전용면적(㎡)` = col_double(),
  ..   계약년월 = col_double(),
  ..   계약일 = col_double(),
  ..   `거래금액(만원)` = col_number(),
  ..   층 = col_double(),
  ..   건축년도 = col_double(),
  ..   도로명 = col_character(),
  ..   해제사유발생일 = col_logical()
  .. )
 - attr(*, "problems")=<externalptr> 
dim(apt2)
[1] 58805    13
# 파일의 10줄을 읽어서 인코딩을 유추합니다.
read_lines("data/아파트매매_실거래가_200001.csv", n_max = 10) %>% 
  guess_encoding()
# A tibble: 3 x 2
  encoding confidence
  <chr>         <dbl>
1 EUC-KR         1   
2 GB18030        0.58
3 Big5           0.34

read.csv() 함수는 data.frame 객체를 반환하지만, read 패키지의 함수들은 tibble 객체를 반환합니다.

대용량 TSV 파일 읽기

FoodFacts.tsv 파일은 약 92MB의 용량으로 대용량이라 보기 어렵지만, 실습 차원에서 대용량으로 간주하여 진행합니다.

file.size("data/FoodFacts.tsv") / 1024^2
[1] 91.65345

기존 방법

read.csv() 함수는 TSV 파일을 읽을 수 있는 함수입니다.

elapse_old <- system.time(
  foodfact <- read.csv(file = "data/FoodFacts.tsv",
                       sep = "\t",
                       header = TRUE, 
                       stringsAsFactors = FALSE)
)

elapse_old
   user  system elapsed 
  2.568   0.086   2.663 
dim(foodfact)
[1] 23179   163
str(foodfact)
'data.frame':   23179 obs. of  163 variables:
 $ code                                      : num  3087 4530 4559 16087 16094 ...
 $ url                                       : chr  "http://world-en.openfoodfacts.org/product/0000000003087/farine-de-ble-noir-ferme-t-y-r-nao" "http://world-en.openfoodfacts.org/product/0000000004530/banana-chips-sweetened-whole" "http://world-en.openfoodfacts.org/product/0000000004559/peanuts-torn-glasser" "http://world-en.openfoodfacts.org/product/0000000016087/organic-salted-nut-mix-grizzlies" ...
 $ creator                                   : chr  "openfoodfacts-contributors" "usda-ndb-import" "usda-ndb-import" "usda-ndb-import" ...
 $ created_t                                 : int  1474103866 1489069957 1489069957 1489055731 1489055653 1489055651 1489055730 1489055711 1489055651 1489055654 ...
 $ created_datetime                          : chr  "2016-09-17T09:17:46Z" "2017-03-09T14:32:37Z" "2017-03-09T14:32:37Z" "2017-03-09T10:35:31Z" ...
 $ last_modified_t                           : int  1474103893 1489069957 1489069957 1489055731 1489055653 1489055651 1489055730 1489055712 1489055651 1489055654 ...
 $ last_modified_datetime                    : chr  "2016-09-17T09:18:13Z" "2017-03-09T14:32:37Z" "2017-03-09T14:32:37Z" "2017-03-09T10:35:31Z" ...
 $ product_name                              : chr  "Farine de blé noir" "Banana Chips Sweetened (Whole)" "Peanuts" "Organic Salted Nut Mix" ...
 $ generic_name                              : chr  "" "" "" "" ...
 $ quantity                                  : chr  "1kg" "" "" "" ...
 $ packaging                                 : chr  "" "" "" "" ...
 $ packaging_tags                            : chr  "" "" "" "" ...
 $ brands                                    : chr  "Ferme t'y R'nao" "" "Torn & Glasser" "Grizzlies" ...
 $ brands_tags                               : chr  "ferme-t-y-r-nao" "" "torn-glasser" "grizzlies" ...
 $ categories                                : chr  "" "" "" "" ...
 $ categories_tags                           : chr  "" "" "" "" ...
 $ categories_en                             : chr  "" "" "" "" ...
 $ origins                                   : chr  "" "" "" "" ...
 $ origins_tags                              : chr  "" "" "" "" ...
 $ manufacturing_places                      : chr  "" "" "" "" ...
 $ manufacturing_places_tags                 : chr  "" "" "" "" ...
 $ labels                                    : chr  "" "" "" "" ...
 $ labels_tags                               : chr  "" "" "" "" ...
 $ labels_en                                 : chr  "" "" "" "" ...
 $ emb_codes                                 : chr  "" "" "" "" ...
 $ emb_codes_tags                            : chr  "" "" "" "" ...
 $ first_packaging_code_geo                  : chr  "" "" "" "" ...
 $ cities                                    : logi  NA NA NA NA NA NA ...
 $ cities_tags                               : chr  "" "" "" "" ...
 $ purchase_places                           : chr  "" "" "" "" ...
 $ stores                                    : chr  "" "" "" "" ...
 $ countries                                 : chr  "en:FR" "US" "US" "US" ...
 $ countries_tags                            : chr  "en:france" "en:united-states" "en:united-states" "en:united-states" ...
 $ countries_en                              : chr  "France" "United States" "United States" "United States" ...
 $ ingredients_text                          : chr  "" "Bananas, vegetable oil (coconut oil, corn oil and/or palm oil) sugar, natural banana flavor." "Peanuts, wheat flour, sugar, rice flour, tapioca starch, salt, leavening (ammonium bicarbonate, baking soda), s"| __truncated__ "Organic hazelnuts, organic cashews, organic walnuts almonds, organic sunflower oil, sea salt." ...
 $ allergens                                 : chr  "" "" "" "" ...
 $ allergens_en                              : logi  NA NA NA NA NA NA ...
 $ traces                                    : chr  "" "" "" "" ...
 $ traces_tags                               : chr  "" "" "" "" ...
 $ traces_en                                 : chr  "" "" "" "" ...
 $ serving_size                              : chr  "" "28 g (1 ONZ)" "28 g (0.25 cup)" "28 g (0.25 cup)" ...
 $ no_nutriments                             : logi  NA NA NA NA NA NA ...
 $ additives_n                               : int  NA 0 0 0 0 0 0 1 0 0 ...
 $ additives                                 : chr  "" " [ bananas -> en:bananas  ]  [ vegetable-oil -> en:vegetable-oil  ]  [ oil -> en:oil  ]  [ coconut-oil -> en:co"| __truncated__ " [ peanuts -> en:peanuts  ]  [ wheat-flour -> en:wheat-flour  ]  [ flour -> en:flour  ]  [ sugar -> en:sugar  ]"| __truncated__ " [ organic-hazelnuts -> en:organic-hazelnuts  ]  [ hazelnuts -> en:hazelnuts  ]  [ organic-cashews -> en:organi"| __truncated__ ...
 $ additives_tags                            : chr  "" "" "" "" ...
 $ additives_en                              : chr  "" "" "" "" ...
 $ ingredients_from_palm_oil_n               : chr  "" "0" "0" "0" ...
 $ ingredients_from_palm_oil                 : logi  NA NA NA NA NA NA ...
 $ ingredients_from_palm_oil_tags            : chr  "" "" "" "" ...
 $ ingredients_that_may_be_from_palm_oil_n   : chr  "" "0" "0" "0" ...
 $ ingredients_that_may_be_from_palm_oil     : chr  "" "" "" "" ...
 $ ingredients_that_may_be_from_palm_oil_tags: chr  "" "" "" "" ...
 $ nutrition_grade_uk                        : int  NA NA NA NA NA NA NA NA NA NA ...
 $ nutrition_grade_fr                        : chr  "" "d" "b" "d" ...
 $ pnns_groups_1                             : chr  "" "" "" "" ...
 $ pnns_groups_2                             : chr  "" "" "" "" ...
 $ states                                    : chr  "en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-b"| __truncated__ "en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed,"| __truncated__ "en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed,"| __truncated__ "en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed,"| __truncated__ ...
 $ states_tags                               : chr  "en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-c"| __truncated__ "en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:"| __truncated__ "en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:"| __truncated__ "en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:"| __truncated__ ...
 $ states_en                                 : chr  "To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Cha"| __truncated__ "To be completed,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Packaging-code-"| __truncated__ "To be completed,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Packaging-code-"| __truncated__ "To be completed,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Packaging-code-"| __truncated__ ...
 $ main_category                             : chr  "" "" "" "" ...
 $ main_category_en                          : chr  "" "" "" "" ...
 $ image_url                                 : chr  "" "" "" "" ...
 $ image_small_url                           : chr  "" "" "" "" ...
 $ energy_100g                               : chr  "" "2243" "1941" "2540" ...
 $ energy.from.fat_100g                      : chr  "" "" "" "" ...
 $ fat_100g                                  : chr  "" "28.57" "17.86" "57.14" ...
 $ saturated.fat_100g                        : chr  "" "28.57" "0" "5.36" ...
 $ X.butyric.acid_100g                       : chr  "" "" "" "" ...
 $ X.caproic.acid_100g                       : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.caprylic.acid_100g                      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.capric.acid_100g                        : logi  NA NA NA NA NA NA ...
 $ X.lauric.acid_100g                        : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.myristic.acid_100g                      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.palmitic.acid_100g                      : chr  "" "" "" "" ...
 $ X.stearic.acid_100g                       : logi  NA NA NA NA NA NA ...
 $ X.arachidic.acid_100g                     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ X.behenic.acid_100g                       : chr  "" "" "" "" ...
 $ X.lignoceric.acid_100g                    : chr  "" "" "" "" ...
 $ X.cerotic.acid_100g                       : chr  "" "" "" "" ...
 $ X.montanic.acid_100g                      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.melissic.acid_100g                      : logi  NA NA NA NA NA NA ...
 $ monounsaturated.fat_100g                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ polyunsaturated.fat_100g                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ omega.3.fat_100g                          : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.alpha.linolenic.acid_100g               : num  NA NA NA NA NA NA NA NA NA NA ...
 $ X.eicosapentaenoic.acid_100g              : logi  NA NA NA NA NA NA ...
 $ X.docosahexaenoic.acid_100g               : chr  "" "" "" "" ...
 $ omega.6.fat_100g                          : logi  NA NA NA NA NA NA ...
 $ X.linoleic.acid_100g                      : logi  NA NA NA NA NA NA ...
 $ X.arachidonic.acid_100g                   : chr  "" "" "" "" ...
 $ X.gamma.linolenic.acid_100g               : chr  "" "" "" "" ...
 $ X.dihomo.gamma.linolenic.acid_100g        : chr  "" "" "" "" ...
 $ omega.9.fat_100g                          : logi  NA NA NA NA NA NA ...
 $ X.oleic.acid_100g                         : logi  NA NA NA NA NA NA ...
 $ X.elaidic.acid_100g                       : logi  NA NA NA NA NA NA ...
 $ X.gondoic.acid_100g                       : logi  NA NA NA NA NA NA ...
 $ X.mead.acid_100g                          : int  NA NA NA NA NA NA NA NA NA NA ...
 $ X.erucic.acid_100g                        : logi  NA NA NA NA NA NA ...
 $ X.nervonic.acid_100g                      : int  NA NA NA NA NA NA NA NA NA NA ...
  [list output truncated]

readr

read_tsv() 함수는 readr 패키지에서 TSV 파일을 읽을 수 있는 함수입니다.

elapse_readr <- system.time(
  foodfact2 <- read_tsv(file = "data/FoodFacts.tsv")
)

elapse_readr
   user  system elapsed 
  4.846   0.267   4.186 
dim(foodfact2)
[1] 35000   163
str(foodfact2)
spec_tbl_df [35,000 × 163] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ code                                      : chr [1:35000] "0000000003087" "0000000004530" "0000000004559" "0000000016087" ...
 $ url                                       : chr [1:35000] "http://world-en.openfoodfacts.org/product/0000000003087/farine-de-ble-noir-ferme-t-y-r-nao" "http://world-en.openfoodfacts.org/product/0000000004530/banana-chips-sweetened-whole" "http://world-en.openfoodfacts.org/product/0000000004559/peanuts-torn-glasser" "http://world-en.openfoodfacts.org/product/0000000016087/organic-salted-nut-mix-grizzlies" ...
 $ creator                                   : chr [1:35000] "openfoodfacts-contributors" "usda-ndb-import" "usda-ndb-import" "usda-ndb-import" ...
 $ created_t                                 : num [1:35000] 1.47e+09 1.49e+09 1.49e+09 1.49e+09 1.49e+09 ...
 $ created_datetime                          : POSIXct[1:35000], format: "2016-09-17 09:17:46" ...
 $ last_modified_t                           : num [1:35000] 1.47e+09 1.49e+09 1.49e+09 1.49e+09 1.49e+09 ...
 $ last_modified_datetime                    : POSIXct[1:35000], format: "2016-09-17 09:18:13" ...
 $ product_name                              : chr [1:35000] "Farine de blé noir" "Banana Chips Sweetened (Whole)" "Peanuts" "Organic Salted Nut Mix" ...
 $ generic_name                              : chr [1:35000] NA NA NA NA ...
 $ quantity                                  : chr [1:35000] "1kg" NA NA NA ...
 $ packaging                                 : chr [1:35000] NA NA NA NA ...
 $ packaging_tags                            : chr [1:35000] NA NA NA NA ...
 $ brands                                    : chr [1:35000] "Ferme t'y R'nao" NA "Torn & Glasser" "Grizzlies" ...
 $ brands_tags                               : chr [1:35000] "ferme-t-y-r-nao" NA "torn-glasser" "grizzlies" ...
 $ categories                                : chr [1:35000] NA NA NA NA ...
 $ categories_tags                           : chr [1:35000] NA NA NA NA ...
 $ categories_en                             : chr [1:35000] NA NA NA NA ...
 $ origins                                   : chr [1:35000] NA NA NA NA ...
 $ origins_tags                              : chr [1:35000] NA NA NA NA ...
 $ manufacturing_places                      : chr [1:35000] NA NA NA NA ...
 $ manufacturing_places_tags                 : chr [1:35000] NA NA NA NA ...
 $ labels                                    : chr [1:35000] NA NA NA NA ...
 $ labels_tags                               : chr [1:35000] NA NA NA NA ...
 $ labels_en                                 : chr [1:35000] NA NA NA NA ...
 $ emb_codes                                 : chr [1:35000] NA NA NA NA ...
 $ emb_codes_tags                            : chr [1:35000] NA NA NA NA ...
 $ first_packaging_code_geo                  : logi [1:35000] NA NA NA NA NA NA ...
 $ cities                                    : logi [1:35000] NA NA NA NA NA NA ...
 $ cities_tags                               : logi [1:35000] NA NA NA NA NA NA ...
 $ purchase_places                           : chr [1:35000] NA NA NA NA ...
 $ stores                                    : chr [1:35000] NA NA NA NA ...
 $ countries                                 : chr [1:35000] "en:FR" "US" "US" "US" ...
 $ countries_tags                            : chr [1:35000] "en:france" "en:united-states" "en:united-states" "en:united-states" ...
 $ countries_en                              : chr [1:35000] "France" "United States" "United States" "United States" ...
 $ ingredients_text                          : chr [1:35000] NA "Bananas, vegetable oil (coconut oil, corn oil and/or palm oil) sugar, natural banana flavor." "Peanuts, wheat flour, sugar, rice flour, tapioca starch, salt, leavening (ammonium bicarbonate, baking soda), s"| __truncated__ "Organic hazelnuts, organic cashews, organic walnuts almonds, organic sunflower oil, sea salt." ...
 $ allergens                                 : chr [1:35000] NA NA NA NA ...
 $ allergens_en                              : logi [1:35000] NA NA NA NA NA NA ...
 $ traces                                    : chr [1:35000] NA NA NA NA ...
 $ traces_tags                               : chr [1:35000] NA NA NA NA ...
 $ traces_en                                 : chr [1:35000] NA NA NA NA ...
 $ serving_size                              : chr [1:35000] NA "28 g (1 ONZ)" "28 g (0.25 cup)" "28 g (0.25 cup)" ...
 $ no_nutriments                             : logi [1:35000] NA NA NA NA NA NA ...
 $ additives_n                               : num [1:35000] NA 0 0 0 0 0 0 1 0 0 ...
 $ additives                                 : chr [1:35000] NA "[ bananas -> en:bananas  ]  [ vegetable-oil -> en:vegetable-oil  ]  [ oil -> en:oil  ]  [ coconut-oil -> en:coc"| __truncated__ "[ peanuts -> en:peanuts  ]  [ wheat-flour -> en:wheat-flour  ]  [ flour -> en:flour  ]  [ sugar -> en:sugar  ] "| __truncated__ "[ organic-hazelnuts -> en:organic-hazelnuts  ]  [ hazelnuts -> en:hazelnuts  ]  [ organic-cashews -> en:organic"| __truncated__ ...
 $ additives_tags                            : chr [1:35000] NA NA NA NA ...
 $ additives_en                              : chr [1:35000] NA NA NA NA ...
 $ ingredients_from_palm_oil_n               : num [1:35000] NA 0 0 0 0 0 0 0 0 0 ...
 $ ingredients_from_palm_oil                 : logi [1:35000] NA NA NA NA NA NA ...
 $ ingredients_from_palm_oil_tags            : chr [1:35000] NA NA NA NA ...
 $ ingredients_that_may_be_from_palm_oil_n   : num [1:35000] NA 0 0 0 0 0 0 0 0 0 ...
 $ ingredients_that_may_be_from_palm_oil     : logi [1:35000] NA NA NA NA NA NA ...
 $ ingredients_that_may_be_from_palm_oil_tags: chr [1:35000] NA NA NA NA ...
 $ nutrition_grade_uk                        : logi [1:35000] NA NA NA NA NA NA ...
 $ nutrition_grade_fr                        : chr [1:35000] NA "d" "b" "d" ...
 $ pnns_groups_1                             : chr [1:35000] NA NA NA NA ...
 $ pnns_groups_2                             : chr [1:35000] NA NA NA NA ...
 $ states                                    : chr [1:35000] "en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-b"| __truncated__ "en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed,"| __truncated__ "en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed,"| __truncated__ "en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed,"| __truncated__ ...
 $ states_tags                               : chr [1:35000] "en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-c"| __truncated__ "en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:"| __truncated__ "en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:"| __truncated__ "en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:"| __truncated__ ...
 $ states_en                                 : chr [1:35000] "To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Cha"| __truncated__ "To be completed,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Packaging-code-"| __truncated__ "To be completed,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Packaging-code-"| __truncated__ "To be completed,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Packaging-code-"| __truncated__ ...
 $ main_category                             : chr [1:35000] NA NA NA NA ...
 $ main_category_en                          : chr [1:35000] NA NA NA NA ...
 $ image_url                                 : chr [1:35000] NA NA NA NA ...
 $ image_small_url                           : chr [1:35000] NA NA NA NA ...
 $ energy_100g                               : num [1:35000] NA 2243 1941 2540 1552 ...
 $ energy-from-fat_100g                      : num [1:35000] NA NA NA NA NA NA NA NA NA NA ...
 $ fat_100g                                  : num [1:35000] NA 28.57 17.86 57.14 1.43 ...
 $ saturated-fat_100g                        : num [1:35000] NA 28.57 0 5.36 NA ...
 $ -butyric-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -caproic-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -caprylic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
 $ -capric-acid_100g                         : logi [1:35000] NA NA NA NA NA NA ...
 $ -lauric-acid_100g                         : logi [1:35000] NA NA NA NA NA NA ...
 $ -myristic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
 $ -palmitic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
 $ -stearic-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -arachidic-acid_100g                      : logi [1:35000] NA NA NA NA NA NA ...
 $ -behenic-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -lignoceric-acid_100g                     : logi [1:35000] NA NA NA NA NA NA ...
 $ -cerotic-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -montanic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
 $ -melissic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
 $ monounsaturated-fat_100g                  : num [1:35000] NA NA NA NA NA NA NA NA NA NA ...
 $ polyunsaturated-fat_100g                  : num [1:35000] NA NA NA NA NA NA NA NA NA NA ...
 $ omega-3-fat_100g                          : logi [1:35000] NA NA NA NA NA NA ...
 $ -alpha-linolenic-acid_100g                : logi [1:35000] NA NA NA NA NA NA ...
 $ -eicosapentaenoic-acid_100g               : logi [1:35000] NA NA NA NA NA NA ...
 $ -docosahexaenoic-acid_100g                : logi [1:35000] NA NA NA NA NA NA ...
 $ omega-6-fat_100g                          : logi [1:35000] NA NA NA NA NA NA ...
 $ -linoleic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
 $ -arachidonic-acid_100g                    : logi [1:35000] NA NA NA NA NA NA ...
 $ -gamma-linolenic-acid_100g                : logi [1:35000] NA NA NA NA NA NA ...
 $ -dihomo-gamma-linolenic-acid_100g         : logi [1:35000] NA NA NA NA NA NA ...
 $ omega-9-fat_100g                          : logi [1:35000] NA NA NA NA NA NA ...
 $ -oleic-acid_100g                          : logi [1:35000] NA NA NA NA NA NA ...
 $ -elaidic-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -gondoic-acid_100g                        : logi [1:35000] NA NA NA NA NA NA ...
 $ -mead-acid_100g                           : logi [1:35000] NA NA NA NA NA NA ...
 $ -erucic-acid_100g                         : logi [1:35000] NA NA NA NA NA NA ...
 $ -nervonic-acid_100g                       : logi [1:35000] NA NA NA NA NA NA ...
  [list output truncated]
 - attr(*, "spec")=
  .. cols(
  ..   code = col_character(),
  ..   url = col_character(),
  ..   creator = col_character(),
  ..   created_t = col_double(),
  ..   created_datetime = col_datetime(format = ""),
  ..   last_modified_t = col_double(),
  ..   last_modified_datetime = col_datetime(format = ""),
  ..   product_name = col_character(),
  ..   generic_name = col_character(),
  ..   quantity = col_character(),
  ..   packaging = col_character(),
  ..   packaging_tags = col_character(),
  ..   brands = col_character(),
  ..   brands_tags = col_character(),
  ..   categories = col_character(),
  ..   categories_tags = col_character(),
  ..   categories_en = col_character(),
  ..   origins = col_character(),
  ..   origins_tags = col_character(),
  ..   manufacturing_places = col_character(),
  ..   manufacturing_places_tags = col_character(),
  ..   labels = col_character(),
  ..   labels_tags = col_character(),
  ..   labels_en = col_character(),
  ..   emb_codes = col_character(),
  ..   emb_codes_tags = col_character(),
  ..   first_packaging_code_geo = col_logical(),
  ..   cities = col_logical(),
  ..   cities_tags = col_logical(),
  ..   purchase_places = col_character(),
  ..   stores = col_character(),
  ..   countries = col_character(),
  ..   countries_tags = col_character(),
  ..   countries_en = col_character(),
  ..   ingredients_text = col_character(),
  ..   allergens = col_character(),
  ..   allergens_en = col_logical(),
  ..   traces = col_character(),
  ..   traces_tags = col_character(),
  ..   traces_en = col_character(),
  ..   serving_size = col_character(),
  ..   no_nutriments = col_logical(),
  ..   additives_n = col_double(),
  ..   additives = col_character(),
  ..   additives_tags = col_character(),
  ..   additives_en = col_character(),
  ..   ingredients_from_palm_oil_n = col_double(),
  ..   ingredients_from_palm_oil = col_logical(),
  ..   ingredients_from_palm_oil_tags = col_character(),
  ..   ingredients_that_may_be_from_palm_oil_n = col_double(),
  ..   ingredients_that_may_be_from_palm_oil = col_logical(),
  ..   ingredients_that_may_be_from_palm_oil_tags = col_character(),
  ..   nutrition_grade_uk = col_logical(),
  ..   nutrition_grade_fr = col_character(),
  ..   pnns_groups_1 = col_character(),
  ..   pnns_groups_2 = col_character(),
  ..   states = col_character(),
  ..   states_tags = col_character(),
  ..   states_en = col_character(),
  ..   main_category = col_character(),
  ..   main_category_en = col_character(),
  ..   image_url = col_character(),
  ..   image_small_url = col_character(),
  ..   energy_100g = col_double(),
  ..   `energy-from-fat_100g` = col_double(),
  ..   fat_100g = col_double(),
  ..   `saturated-fat_100g` = col_double(),
  ..   `-butyric-acid_100g` = col_logical(),
  ..   `-caproic-acid_100g` = col_logical(),
  ..   `-caprylic-acid_100g` = col_logical(),
  ..   `-capric-acid_100g` = col_logical(),
  ..   `-lauric-acid_100g` = col_logical(),
  ..   `-myristic-acid_100g` = col_logical(),
  ..   `-palmitic-acid_100g` = col_logical(),
  ..   `-stearic-acid_100g` = col_logical(),
  ..   `-arachidic-acid_100g` = col_logical(),
  ..   `-behenic-acid_100g` = col_logical(),
  ..   `-lignoceric-acid_100g` = col_logical(),
  ..   `-cerotic-acid_100g` = col_logical(),
  ..   `-montanic-acid_100g` = col_logical(),
  ..   `-melissic-acid_100g` = col_logical(),
  ..   `monounsaturated-fat_100g` = col_double(),
  ..   `polyunsaturated-fat_100g` = col_double(),
  ..   `omega-3-fat_100g` = col_logical(),
  ..   `-alpha-linolenic-acid_100g` = col_logical(),
  ..   `-eicosapentaenoic-acid_100g` = col_logical(),
  ..   `-docosahexaenoic-acid_100g` = col_logical(),
  ..   `omega-6-fat_100g` = col_logical(),
  ..   `-linoleic-acid_100g` = col_logical(),
  ..   `-arachidonic-acid_100g` = col_logical(),
  ..   `-gamma-linolenic-acid_100g` = col_logical(),
  ..   `-dihomo-gamma-linolenic-acid_100g` = col_logical(),
  ..   `omega-9-fat_100g` = col_logical(),
  ..   `-oleic-acid_100g` = col_logical(),
  ..   `-elaidic-acid_100g` = col_logical(),
  ..   `-gondoic-acid_100g` = col_logical(),
  ..   `-mead-acid_100g` = col_logical(),
  ..   `-erucic-acid_100g` = col_logical(),
  ..   `-nervonic-acid_100g` = col_logical(),
  ..   `trans-fat_100g` = col_double(),
  ..   cholesterol_100g = col_double(),
  ..   carbohydrates_100g = col_double(),
  ..   sugars_100g = col_double(),
  ..   `-sucrose_100g` = col_logical(),
  ..   `-glucose_100g` = col_logical(),
  ..   `-fructose_100g` = col_logical(),
  ..   `-lactose_100g` = col_double(),
  ..   `-maltose_100g` = col_logical(),
  ..   `-maltodextrins_100g` = col_logical(),
  ..   starch_100g = col_double(),
  ..   polyols_100g = col_logical(),
  ..   fiber_100g = col_double(),
  ..   proteins_100g = col_double(),
  ..   casein_100g = col_logical(),
  ..   `serum-proteins_100g` = col_logical(),
  ..   nucleotides_100g = col_logical(),
  ..   salt_100g = col_double(),
  ..   sodium_100g = col_double(),
  ..   alcohol_100g = col_logical(),
  ..   `vitamin-a_100g` = col_double(),
  ..   `beta-carotene_100g` = col_logical(),
  ..   `vitamin-d_100g` = col_double(),
  ..   `vitamin-e_100g` = col_double(),
  ..   `vitamin-k_100g` = col_double(),
  ..   `vitamin-c_100g` = col_double(),
  ..   `vitamin-b1_100g` = col_double(),
  ..   `vitamin-b2_100g` = col_double(),
  ..   `vitamin-pp_100g` = col_double(),
  ..   `vitamin-b6_100g` = col_double(),
  ..   `vitamin-b9_100g` = col_double(),
  ..   folates_100g = col_double(),
  ..   `vitamin-b12_100g` = col_double(),
  ..   biotin_100g = col_logical(),
  ..   `pantothenic-acid_100g` = col_double(),
  ..   silica_100g = col_logical(),
  ..   bicarbonate_100g = col_logical(),
  ..   potassium_100g = col_double(),
  ..   chloride_100g = col_logical(),
  ..   calcium_100g = col_double(),
  ..   phosphorus_100g = col_double(),
  ..   iron_100g = col_double(),
  ..   magnesium_100g = col_double(),
  ..   zinc_100g = col_double(),
  ..   copper_100g = col_double(),
  ..   manganese_100g = col_double(),
  ..   fluoride_100g = col_logical(),
  ..   selenium_100g = col_double(),
  ..   chromium_100g = col_logical(),
  ..   molybdenum_100g = col_logical(),
  ..   iodine_100g = col_logical(),
  ..   caffeine_100g = col_logical(),
  ..   taurine_100g = col_logical(),
  ..   ph_100g = col_logical(),
  ..   `fruits-vegetables-nuts_100g` = col_logical(),
  ..   `fruits-vegetables-nuts-estimate_100g` = col_double(),
  ..   `collagen-meat-protein-ratio_100g` = col_logical(),
  ..   cocoa_100g = col_logical(),
  ..   chlorophyl_100g = col_logical(),
  ..   `carbon-footprint_100g` = col_logical(),
  ..   `nutrition-score-fr_100g` = col_double(),
  ..   `nutrition-score-uk_100g` = col_double(),
  ..   `glycemic-index_100g` = col_logical(),
  ..   `water-hardness_100g` = col_logical()
  .. )
 - attr(*, "problems")=<externalptr> 

기존 방법은 2.663초, readr 패키지는 4.186초가 소요되었습니다. 데이터의 용량이 그리 크지 않기 때문에 기존 방법의 수행속도가 다소 빠를 수 있지만, 대용량 데이터의 경우에는 readr 패키지의 속도가 빠릅니다.

그리고 35,000건의 데이터 중에서 기존의 방법은 23,179만 읽었을 뿐입니다.

솔루션

파일의 용량이 클 경우에는 readr 패키지를 이용해서 데이터를 읽는 것이 시간을 절약할 수 있는 유용한 방법입니다.

Citation

For attribution, please cite this work as

유충현 (2022, Feb. 23). Dataholic: Import data with readr. Retrieved from https://choonghyunryu.github.io/2022-02-23-readr

BibTeX citation

@misc{유충현2022import,
  author = {유충현, },
  title = {Dataholic: Import data with readr},
  url = {https://choonghyunryu.github.io/2022-02-23-readr},
  year = {2022}
}