Reading tabular data with readr package

< 1 min. read

Recently, Hadley Wickham introduced a new package to read tabular data (such as CSV), lines and entire files.

Advantages include:
1. Helpful defaults over base R read.csv such as: Characters are never automatically converted to factors and row names are never set.
2. Faster reads.
3. When reading large files, a progress bar is displayed.
4. A very useful problems() function that allows you to zoom into rows of data that readr has issues loading into for e.g. expected integer for this column but actual data is text.

Here’s a quick look at readr’s read_csv method and the base R function read.csv

Base R:

url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv"
download.file(url, "iris.csv", quiet=TRUE)
iris <- read.csv("iris.csv", header=TRUE, stringsAsFactors=FALSE)
head(iris)
##   X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1          5.1         3.5          1.4         0.2  setosa
## 2 2          4.9         3.0          1.4         0.2  setosa
## 3 3          4.7         3.2          1.3         0.2  setosa
## 4 4          4.6         3.1          1.5         0.2  setosa
## 5 5          5.0         3.6          1.4         0.2  setosa
## 6 6          5.4         3.9          1.7         0.4  setosa

With Readr:

library(readr)
url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv"
download.file(url, "iris.csv", quiet=TRUE)
iris <- read_csv("iris.csv")
head(iris)
##   [EMPTY] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1       1          5.1         3.5          1.4         0.2  setosa
## 2       2          4.9         3.0          1.4         0.2  setosa
## 3       3          4.7         3.2          1.3         0.2  setosa
## 4       4          4.6         3.1          1.5         0.2  setosa
## 5       5          5.0         3.6          1.4         0.2  setosa
## 6       6          5.4         3.9          1.7         0.4  setosa