< 1 min. read Kevin Markham has released the second part to his excellent dplyr tutorial, covering new functionality. Enjoy! [Via dataschool.io]
Data Science Cross Reference update 1 – Getting Data
< 1 min. read Following up on my previous post, I’ve added Getting Data to the collection. It deals with reading of CSVs, Excel (XLSX), JSON and Tweets.
Download files over HTTPS in R with httr
< 1 min. read To download a file over HTTP connection, we normally use download.file command in R, for example: url = "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv" download.file(url, "iris.csv", quiet=TRUE) For HTTPS connections, download.file may give you some issues. In situations like this you can consider using the httr package to download files: library(httr) url <- "https://rawgit.com/yoke2/dsxref/master/iris.xlsx" GET(url, write_disk("iris.xlsx", overwrite=TRUE))
#SG50ShadesOfGrey – An #rstats Analysis
3 min. read It seems like the 50 Shades of Grey movie has spawned humor over Twitter in Singapore, as well as making rounds internationally. In the spirit of #rstats, let’s look at some trends of #SG50ShadesOfGrey. We shall use twitteR and foreach package to get a data frame of the popular tweets for #sg50shadesofgrey library(twitteR) consumerKey <- […]
Data Science Learning – A Cross Reference
< 1 min. read While learning data science, I’ve discovered that it is very useful to think of the data science processing as a “pipeline” i.e. a series of actions in a process. Along this pipeline, you will be tackling lots of “How do I…” questions like “How do I remove NA values?” and “How do I create N-grams?” […]
Wordcloud for National Day Rally 2014 speech
< 1 min. read Inspired by a wordcloud from Obama’s State of the Union address, let’s look at creating a wordcloud for Singapore’s National Day Rally 2014 speech using R. library(rvest) library(RCurl) ## Loading required package: bitops url <- "http://www.pmo.gov.sg/mediacentre/prime-minister-lee-hsien-loongs-national-day-rally-2014-speech-english" # scrapes the speech from the URL above curlSpeech<- getURL(url) speech <- curlSpeech %>% html() %>% html_nodes(".view-mode-full") %>% html_text() […]
A virtual environment for data science
< 1 min. read I wanted to conveniently use data science tools without the hassle of installing the required languages and packages, while benefiting from the strengths of the Linux command line tools. There is a pre-packaged VM called the Data Science Toolbox that fills this need. It comes with R and Python installed, along with the respective popular […]
Speeding up R
< 1 min. read When processing large datasets, some R commands operations requiring linear algebra libraries may run very slowly. As such you might want to try Revolution R Open, an enhanced R distribution using multi-threaded libraries. You can see some benchmarks here and here.
Exploring Focal Length with Exiftool and R

5 min. read Picture Credits: Pixabay A good way to understand your shooting style and guide your future camera equipment buying decisions will be to discover your frequently used focal lengths. Focal Lengths can be extract from photos that have EXIF data, which in short refers to data on how these photos are taken. You can find out […]