< 1 min. read Sometimes, you might need to view and search very large text files like logs or SQL dumps. On Windows and Linux, you can give glogg a try. Please note that this application is read-only.

# Blog

## Running Apache Spark with sparklyr and R in Windows

3 min. read RStudio recently released the sparklyr package that allows users to connect to Apache Spark instances from R. In addition, this package offers dplyr integration, allowing you to utilize Spark as you use dplyr functions like filter and select, which is very convenient. The package will also assist you in downloading and installing Apache Spark if […]

## Probability Cheatsheet

< 1 min. read Need a quick reference when studying probability? Here’s a cheatsheet by William Chen and Joe Blitzstein for reference.

## RStudio Cheat Sheets

< 1 min. read In this resource page RStudio provides various cheatsheets for various packages that is very useful for doing data science in the R environment including data wrangling, visualization and package development.

## SG elections – a Twitter snapshot

3 min. read With the election fever ongoing in Singapore, let’s take a snapshot of the popular tweets with the hashtag #ge2015 at this point of time. library(twitteR) consumerKey <- readLines(“twitterkey.txt”) consumerSecret <- readLines(“twittersecret.txt”) accessToken <- readLines(“twitteraccesstoken.txt”) accessTokenSecret <- readLines(“twitteraccesstokensecret.txt”) setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret) ## [1] “Using direct authentication” tweets <- searchTwitter(“#ge2015″, resultType=”popular”, n=100) tweetsdf <- twListToDF(tweets) library(dplyr) tweetsdf <- tbl_df(tweetsdf) […]

## TGIF: The most frequently used words in the English Language

< 1 min. read Here’s a visualization containing various facts on English Language usage. Enjoy! [Via DesignTaxi]

## Downloading R packages securely

< 1 min. read JJ Allaire from RStudio wrote a step-by-step guide on how to configure secured downloads of R packages. Secured downloading of R packages (via HTTPS connection) ensures that you get your packages from legitimate, trusted sources.

## TGIF: Miniature Origami Robot

< 1 min. read Watch this video of a mini origami robot that can self-fold, walk, swim and dissolves in acetone. [Via DesignTaxi]

## stringAsFactors – a brief history

< 1 min. read If you have used R’s read.csv function, you will inevitably use the stringsAsFactors=FALSE option. Roger Peng offers a brief history into why the default for stringAsFactors option is TRUE. Alternatively, you can use the read_csv function from the readr package by Hadley Wickham, which does not automatically convert characters to factors.

## Definition of Statistical Significance

< 1 min. read Note to self: Statistical Significance as defined by Andrew Gelman in his blog post. Statistical Significance Definition: A mathematical technique to measure the strength of evidence from a single study. Statistical significance is conventionally declared when the p-value is less than 0.05. The p-value is the probability of seeing a result as strong as observed […]