< 1 min. read Watch the show Perspectives – From Big Data to Smart Data on Channel NewsAsia Online for insights from industry leaders and academia on Big Data, it’s usage and its effects on individuals and companies.
Updating packages after R upgrade
< 1 min. read Note to self: After upgrading R (or Revolution R Open) on Windows, run the following command to update the packages at one go. update.packages(checkBuilt = TRUE, ask = FALSE)
Drawing map shapes with rgdal
2 min. read In this post, let us explore the R package rgdal for map shape plotting. We shall attempt to plot the map of Singapore and display major road networks in Singapore. Pre-requisites To get the data you need, you can go to a GIS provider. In this post, we shall be using diva-gis. Steps: 1. Go […]
Reading tabular data with readr package
< 1 min. read Recently, Hadley Wickham introduced a new package to read tabular data (such as CSV), lines and entire files. Advantages include: 1. Helpful defaults over base R read.csv such as: Characters are never automatically converted to factors and row names are never set. 2. Faster reads. 3. When reading large files, a progress bar is displayed. […]
Reading excel files with readxl R package
< 1 min. read Recently, Hadley Wickham introduced a new package to read Excel files (XLS, XLSX). The main advantage is that no external dependencies is required for readxl. (xlsx package requires Java Runtime to be installed) With xlsx: library(xlsx) library(httr) url <- "https://rawgit.com/yoke2/dsxref/master/iris.xlsx" GET(url, write_disk("iris.xlsx", overwrite=TRUE)) iris <- read.xlsx("iris.xlsx", sheetIndex=1, header=TRUE) head(iris, 3) ## NA. Sepal.Length Sepal.Width Petal.Length […]
Video Tutorial on dplyr part 2
< 1 min. read Kevin Markham has released the second part to his excellent dplyr tutorial, covering new functionality. Enjoy! [Via dataschool.io]
Data Science Cross Reference update 1 – Getting Data
< 1 min. read Following up on my previous post, I’ve added Getting Data to the collection. It deals with reading of CSVs, Excel (XLSX), JSON and Tweets.
Download files over HTTPS in R with httr
< 1 min. read To download a file over HTTP connection, we normally use download.file command in R, for example: url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv" download.file(url, "iris.csv", quiet=TRUE) For HTTPS connections, download.file may give you some issues. In situations like this you can consider using the httr package to download files: library(httr) url <- "https://rawgit.com/yoke2/dsxref/master/iris.xlsx" GET(url, write_disk("iris.xlsx", overwrite=TRUE))
#SG50ShadesOfGrey – An #rstats Analysis
3 min. read It seems like the 50 Shades of Grey movie has spawned humor over Twitter in Singapore, as well as making rounds internationally. In the spirit of #rstats, let’s look at some trends of #SG50ShadesOfGrey. We shall use twitteR and foreach package to get a data frame of the popular tweets for #sg50shadesofgrey library(twitteR) consumerKey <- […]
Data Science Learning – A Cross Reference
< 1 min. read While learning data science, I’ve discovered that it is very useful to think of the data science processing as a “pipeline” i.e. a series of actions in a process. Along this pipeline, you will be tackling lots of “How do I…” questions like “How do I remove NA values?” and “How do I create N-grams?” […]