Recently, Hadley Wickham introduced a new package to read tabular data (such as CSV), lines and entire files.
Advantages include:
1. Helpful defaults over base R read.csv
such as: Characters are never automatically converted to factors and row names are never set.
2. Faster reads.
3. When reading large files, a progress bar is displayed.
4. A very useful problems()
function that allows you to zoom into rows of data that readr has issues loading into for e.g. expected integer for this column but actual data is text.
Here’s a quick look at readr’s read_csv
method and the base R function read.csv
Base R:
url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv"
download.file(url, "iris.csv", quiet=TRUE)
iris <- read.csv("iris.csv", header=TRUE, stringsAsFactors=FALSE)
head(iris)
## X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1 5.1 3.5 1.4 0.2 setosa
## 2 2 4.9 3.0 1.4 0.2 setosa
## 3 3 4.7 3.2 1.3 0.2 setosa
## 4 4 4.6 3.1 1.5 0.2 setosa
## 5 5 5.0 3.6 1.4 0.2 setosa
## 6 6 5.4 3.9 1.7 0.4 setosa
With Readr:
library(readr)
url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv"
download.file(url, "iris.csv", quiet=TRUE)
iris <- read_csv("iris.csv")
head(iris)
## [EMPTY] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1 5.1 3.5 1.4 0.2 setosa
## 2 2 4.9 3.0 1.4 0.2 setosa
## 3 3 4.7 3.2 1.3 0.2 setosa
## 4 4 4.6 3.1 1.5 0.2 setosa
## 5 5 5.0 3.6 1.4 0.2 setosa
## 6 6 5.4 3.9 1.7 0.4 setosa