< 1 min. readInspired by a wordcloud from Obama’s State of the Union address, let’s look at creating a wordcloud for Singapore’s National Day Rally 2014 speech using R. library(rvest) library(RCurl) ## Loading required package: bitops url <- "http://www.pmo.gov.sg/mediacentre/prime-minister-lee-hsien-loongs-national-day-rally-2014-speech-english" # scrapes the speech from the URL above curlSpeech<- getURL(url) speech <- curlSpeech %>% html() %>% html_nodes(".view-mode-full") %>% html_text() […]
Blog
Completed Data Science Specialization
< 1 min. read“Data Science, a 10-course specialization by Johns Hopkins University on Coursera. Specialization Certificate earned on December 22, 2014” Finally received the certification!
TGIF: Nobody tells this to beginners
< 1 min. readBased on advice given to creatives during an interview with Ira Glass, hope this gives your mojo a boost! [Via DesignTaxi]
A virtual environment for data science
< 1 min. readI wanted to conveniently use data science tools without the hassle of installing the required languages and packages, while benefiting from the strengths of the Linux command line tools. There is a pre-packaged VM called the Data Science Toolbox that fills this need. It comes with R and Python installed, along with the respective popular […]
HiDPI Windows 8 setting for Brackets
< 1 min. readWhile test-driving the brackets.io (Version 1.1) editor on Windows 8, I discovered that the text displayed on the application is not as crisp – it is rather pixelated – at HiDPI resolution of 2880 x 1620. The workaround is to edit the properties of the application and check Disable display scaling on high DPI scaling. […]
Data Science Ontology
< 1 min. readThis site created by Sean McClure, data scientist at ThoughtWorks, shows an overview of data science concepts. I find this structured approach very useful as a gauge to discover areas of improvement. It also serves to provide more information via Wikipedia links at the terminal nodes.
TGIF: Wikigalaxy – A Visualization
< 1 min. readSharing a cool visualization of Wikipedia articles. [via Sploid]
Speeding up R
< 1 min. readWhen processing large datasets, some R commands operations requiring linear algebra libraries may run very slowly. As such you might want to try Revolution R Open, an enhanced R distribution using multi-threaded libraries. You can see some benchmarks here and here.
Calculus Cheat Cheets
< 1 min. readOn the journey of Data Science, you will probably come across Calculus. For students and working professionals alike, the availability of cheat sheets make lives easier. I’m happy and grateful to find out that the Calculus cheat sheets I’ve referred to during my university days are still available on the net, thanks to Paul Dawkins […]
Video tutorial on dplyr
< 1 min. readFor data manipulation in R, you will probably have heard of the plyr package. Dplyr is plyr’s next iteration by Hadley Wickham. It offers greater ease of use and speed over plyr and I find it more intuitive to use. The above video by Kevin Markham is an excellent introduction to dplyr. Enjoy! (Via Revolution […]