3 min. read RStudio recently released the sparklyr package that allows users to connect to Apache Spark instances from R. In addition, this package offers dplyr integration, allowing you to utilize Spark as you use dplyr functions like filter and select, which is very convenient. The package will also assist you in downloading and installing Apache Spark if […]
< 1 min. read In this resource page RStudio provides various cheatsheets for various packages that is very useful for doing data science in the R environment including data wrangling, visualization and package development.
3 min. read With the election fever ongoing in Singapore, let’s take a snapshot of the popular tweets with the hashtag #ge2015 at this point of time. library(twitteR) consumerKey <- readLines("twitterkey.txt") consumerSecret <- readLines("twittersecret.txt") accessToken <- readLines("twitteraccesstoken.txt") accessTokenSecret <- readLines("twitteraccesstokensecret.txt") setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret) ##  "Using direct authentication" tweets <- searchTwitter("#ge2015", resultType="popular", n=100) tweetsdf <- twListToDF(tweets) library(dplyr) tweetsdf <- tbl_df(tweetsdf) […]
< 1 min. read JJ Allaire from RStudio wrote a step-by-step guide on how to configure secured downloads of R packages. Secured downloading of R packages (via HTTPS connection) ensures that you get your packages from legitimate, trusted sources.
< 1 min. read If you have used R’s read.csv function, you will inevitably use the stringsAsFactors=FALSE option. Roger Peng offers a brief history into why the default for stringAsFactors option is TRUE. Alternatively, you can use the read_csv function from the readr package by Hadley Wickham, which does not automatically convert characters to factors.
< 1 min. read When troubleshooting R bugs or asking for assistance in mailing lists and sites like StackOverflow, it is good to review or present information about your system and packages loaded. I much prefer the session_info() function from the devtools package over the default sessionInfo() function as it’s output is not only more readable, it also provides […]
< 1 min. read Note to self: After upgrading R (or Revolution R Open) on Windows, run the following command to update the packages at one go. update.packages(checkBuilt = TRUE, ask = FALSE)
2 min. read In this post, let us explore the R package rgdal for map shape plotting. We shall attempt to plot the map of Singapore and display major road networks in Singapore. Pre-requisites To get the data you need, you can go to a GIS provider. In this post, we shall be using diva-gis. Steps: 1. Go […]
< 1 min. read Recently, Hadley Wickham introduced a new package to read tabular data (such as CSV), lines and entire files. Advantages include: 1. Helpful defaults over base R read.csv such as: Characters are never automatically converted to factors and row names are never set. 2. Faster reads. 3. When reading large files, a progress bar is displayed. […]
< 1 min. read Recently, Hadley Wickham introduced a new package to read Excel files (XLS, XLSX). The main advantage is that no external dependencies is required for readxl. (xlsx package requires Java Runtime to be installed) With xlsx: library(xlsx) library(httr) url <- "https://rawgit.com/yoke2/dsxref/master/iris.xlsx" GET(url, write_disk("iris.xlsx", overwrite=TRUE)) iris <- read.xlsx("iris.xlsx", sheetIndex=1, header=TRUE) head(iris, 3) ## NA. Sepal.Length Sepal.Width Petal.Length […]