AI, ML & Data Science – Page 2

Probability Cheatsheet

2015-09-292015-09-28

< 1 min. readNeed a quick reference when studying probability? Here’s a cheatsheet by William Chen and Joe Blitzstein for reference.

< 1 min. readIn this resource page RStudio provides various cheatsheets for various packages that is very useful for doing data science in the R environment including data wrangling, visualization and package development.

SG elections – a Twitter snapshot

2015-09-072015-09-06

3 min. readWith the election fever ongoing in Singapore, let’s take a snapshot of the popular tweets with the hashtag #ge2015 at this point of time. library(twitteR) consumerKey <- readLines("twitterkey.txt") consumerSecret <- readLines("twittersecret.txt") accessToken <- readLines("twitteraccesstoken.txt") accessTokenSecret <- readLines("twitteraccesstokensecret.txt") setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret) ## [1] "Using direct authentication" tweets <- searchTwitter("#ge2015", resultType="popular", n=100) tweetsdf <- twListToDF(tweets) library(dplyr) tweetsdf <- tbl_df(tweetsdf) […]

Downloading R packages securely

2015-08-252015-08-25

< 1 min. readJJ Allaire from RStudio wrote a step-by-step guide on how to configure secured downloads of R packages. Secured downloading of R packages (via HTTPS connection) ensures that you get your packages from legitimate, trusted sources.

stringAsFactors – a brief history

2015-08-182015-08-17

< 1 min. readIf you have used R’s read.csv function, you will inevitably use the stringsAsFactors=FALSE option. Roger Peng offers a brief history into why the default for stringAsFactors option is TRUE. Alternatively, you can use the read_csv function from the readr package by Hadley Wickham, which does not automatically convert characters to factors.

Definition of Statistical Significance

2015-08-042015-08-04

< 1 min. readNote to self: Statistical Significance as defined by Andrew Gelman in his blog post. Statistical Significance Definition: A mathematical technique to measure the strength of evidence from a single study. Statistical significance is conventionally declared when the p-value is less than 0.05. The p-value is the probability of seeing a result as strong as observed […]

SG General Election 2015 Visualization

2015-07-282015-07-27

< 1 min. readWonder which GRC or SMC you are in? Here’s a visualization that has been circulating around for your reference.

Launching matlab command line in Windows

2015-07-212015-07-18

< 1 min. readSometimes you would only want to launch Matlab’s command line window instead of the full IDE. To do that in Windows, type the following command in the command prompt: matlab -nodesktop

Getting session information in Python

2015-07-072015-06-29

< 1 min. readWe’ve gone through how to get session information in R previously, so how do we do the same for Python? It seems that there is no single convenient function available so here’s one approach. To get the system information, you can utilize the commonly used IPython package: import IPython IPython.sys_info() To find out packages that […]

Getting session information in R

2015-06-302015-06-29

< 1 min. readWhen troubleshooting R bugs or asking for assistance in mailing lists and sites like StackOverflow, it is good to review or present information about your system and packages loaded. I much prefer the session_info() function from the devtools package over the default sessionInfo() function as it’s output is not only more readable, it also provides […]