2 min. readIn this post, let us explore the R package rgdal for map shape plotting. We shall attempt to plot the map of Singapore and display major road networks in Singapore. Pre-requisites To get the data you need, you can go to a GIS provider. In this post, we shall be using diva-gis. Steps: 1. Go […]
Blog
Reading tabular data with readr package
< 1 min. readRecently, Hadley Wickham introduced a new package to read tabular data (such as CSV), lines and entire files. Advantages include: 1. Helpful defaults over base R read.csv such as: Characters are never automatically converted to factors and row names are never set. 2. Faster reads. 3. When reading large files, a progress bar is displayed. […]
Handling line endings on Windows with git
< 1 min. readNote to self: When collaborating on different platforms, one of the most common issue is line endings – LF on Mac/Linux and CRLF on Windows. With git, you can address this issue in the following ways: 1) Configure Global Settings 1) Windows users: git config –global core.autocrlf true 2) Mac/Linux users: git config –global core.autocrlf […]
Reading excel files with readxl R package
< 1 min. readRecently, Hadley Wickham introduced a new package to read Excel files (XLS, XLSX). The main advantage is that no external dependencies is required for readxl. (xlsx package requires Java Runtime to be installed) With xlsx: library(xlsx) library(httr) url <- "https://rawgit.com/yoke2/dsxref/master/iris.xlsx" GET(url, write_disk("iris.xlsx", overwrite=TRUE)) iris <- read.xlsx("iris.xlsx", sheetIndex=1, header=TRUE) head(iris, 3) ## NA. Sepal.Length Sepal.Width Petal.Length […]
Video Tutorial on dplyr part 2
< 1 min. readKevin Markham has released the second part to his excellent dplyr tutorial, covering new functionality. Enjoy! [Via dataschool.io]
Data Science Cross Reference update 1 – Getting Data
< 1 min. readFollowing up on my previous post, I’ve added Getting Data to the collection. It deals with reading of CSVs, Excel (XLSX), JSON and Tweets.
Download files over HTTPS in R with httr
< 1 min. readTo download a file over HTTP connection, we normally use download.file command in R, for example: url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv" download.file(url, "iris.csv", quiet=TRUE) For HTTPS connections, download.file may give you some issues. In situations like this you can consider using the httr package to download files: library(httr) url <- "https://rawgit.com/yoke2/dsxref/master/iris.xlsx" GET(url, write_disk("iris.xlsx", overwrite=TRUE))
#SG50ShadesOfGrey – An #rstats Analysis
3 min. readIt seems like the 50 Shades of Grey movie has spawned humor over Twitter in Singapore, as well as making rounds internationally. In the spirit of #rstats, let’s look at some trends of #SG50ShadesOfGrey. We shall use twitteR and foreach package to get a data frame of the popular tweets for #sg50shadesofgrey library(twitteR) consumerKey <- […]
Data Science Learning – A Cross Reference
< 1 min. readWhile learning data science, I’ve discovered that it is very useful to think of the data science processing as a “pipeline” i.e. a series of actions in a process. Along this pipeline, you will be tackling lots of “How do I…” questions like “How do I remove NA values?” and “How do I create N-grams?” […]
Bay Area UseR Group Meetup Video
< 1 min. readThis is an opportunity to hear speakers in UserR Groups in the U.S. and from great speakers like Ryan Hafen, Hadley Wickham and Nick Elprin nevertheless! The video is hosted through Air Mozilla. Ryan Hafen provided an overview of the Tessera project, an open source environment for deep analysis and visualization of complex data sets. […]