It seems like the 50 Shades of Grey movie has spawned humor over Twitter in Singapore, as well as making rounds internationally. In the spirit of #rstats, let’s look at some trends of #SG50ShadesOfGrey.
We shall use twitteR and foreach package to get a data frame of the popular tweets for #sg50shadesofgrey
library(twitteR)
consumerKey <- readLines("twitterkey.txt")
consumerSecret <- readLines("twittersecret.txt")
accessToken <- readLines("twitteraccesstoken.txt")
accessTokenSecret <- readLines("twitteraccesstokensecret.txt")
setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret)
## [1] "Using direct authentication"
tweets <- searchTwitter("#sg50shadesofgrey", resultType="popular", n=100)
# Each item in the list can be converted into a data frame with attributes
# as columns and one row of data. We will then convert these data frames
# to rows in a single data frame.
library(foreach)
tweetsdf<- foreach(i=1:length(tweets), .combine=rbind) %do% as.data.frame(tweets[[i]])
library(dplyr)
tweetsdf <- tbl_df(tweetsdf)
nrow(tweetsdf)
## [1] 30
names(tweetsdf)
## [1] "text" "favorited" "favoriteCount" "replyToSN"
## [5] "created" "truncated" "replyToSID" "id"
## [9] "replyToUID" "statusSource" "screenName" "retweetCount"
## [13] "isRetweet" "retweeted" "longitude" "latitude"
Let’s first look at the top contributors of these popular tweets.
# There are users with several popular tweets
tweetsdf %>% select(screenName) %>%
group_by(screenName) %>%
summarise(count=n()) %>%
arrange(desc(count))
## Source: local data frame [18 x 2]
##
## screenName count
## 1 alfpang 8
## 2 adibjalal 4
## 3 BBCtrending 2
## 4 asonofapeach 2
## 5 DanialRon 1
## 6 InsideScoot 1
## 7 MIIKOLICIOUS 1
## 8 STcom 1
## 9 SoSingaporean 1
## 10 SyakirahNasri 1
## 11 ahbengpls 1
## 12 ahbengsiao 1
## 13 benjaminkheng 1
## 14 juicyjuleswei 1
## 15 omgitsjy 1
## 16 sammmydee 1
## 17 smrtsg 1
## 18 spinorbinmusic 1
From the counts, we can see that amongst the most popular tweets, the highest number come from the user ‘alfpang’.
Looking at the variables, the retweetCount and favoriteCount variables look interesting. However they are probably highly correlated. We can find out with a plot and confirm with a correlation test.
library(ggplot2)
# Correlation of retweetCount with favoriteCount
tweetsdf %>% select(favoriteCount,retweetCount) %>%
ggplot(., aes(x=favoriteCount,y=retweetCount)) + geom_point() + geom_smooth() +
labs(title = "Favorite Count vs Retweet Count", x = "favorite counts", y = "retweet counts")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
# Correlation Test
with(tweetsdf, cor.test(retweetCount, favoriteCount))
##
## Pearson's product-moment correlation
##
## data: retweetCount and favoriteCount
## t = 27.12, df = 28, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9610520 0.9912528
## sample estimates:
## cor
## 0.981492
Since retweetCount and favoriteCount are highly correlated, we shall focus on retweetCount. Let’s now find out the top 5 users who have their tweets retweeted.
# Top 5 tweets by highest number of retweets per user
tweetsdf %>% select(screenName, retweetCount) %>%
group_by(screenName) %>%
summarise_each(funs(sum)) %>%
arrange(desc(retweetCount)) %>%
top_n(5)
## Selecting by retweetCount
## Source: local data frame [5 x 2]
##
## screenName retweetCount
## 1 alfpang 3625
## 2 SyakirahNasri 3183
## 3 asonofapeach 2724
## 4 juicyjuleswei 2342
## 5 adibjalal 1641
Let’s now look at the top 5 tweets based on retweet count.
# Top 5 tweets by highest number of retweets
top5Tweets <- tweetsdf %>% select(screenName,id,retweetCount) %>%
arrange(desc(retweetCount)) %>%
top_n(5)
## Selecting by retweetCount
top5Tweets
## Source: local data frame [5 x 3]
##
## screenName id retweetCount
## 1 SyakirahNasri 566537040909434881 3183
## 2 juicyjuleswei 566513157405806592 2342
## 3 asonofapeach 566467904720236545 1795
## 4 ahbengpls 566805714094411776 1613
## 5 alfpang 565752652286267392 1274
# Direct Links to the top 5 tweets
paste("http://twitter.com/",top5Tweets$screenName,"/status/", top5Tweets$id, sep="")
## [1] "http://twitter.com/SyakirahNasri/status/566537040909434881"
## [2] "http://twitter.com/juicyjuleswei/status/566513157405806592"
## [3] "http://twitter.com/asonofapeach/status/566467904720236545"
## [4] "http://twitter.com/ahbengpls/status/566805714094411776"
## [5] "http://twitter.com/alfpang/status/565752652286267392"
Here are the top 5 tweets:
It was a tight fit. It was hard to go in. Finally when he was in, he gasped in relief.
A day in the MRT @ peak hour. #SG50ShadesOfGrey
— k. (@kxrah) February 14, 2015
she held e balls in her mouth til she felt tt familiar explosion of sweet, warm fluid. "this ondeh ondeh damn shiok man" #SG50ShadesOfGrey
— jules wei (@juicyjuleswei) February 14, 2015
Her hands quivered inside her pants. Her heart is beating fast; it was approaching. Finally, she found her ezlink card.#SG50ShadesOfGrey
— Mr. H (@asonofapeach) February 14, 2015
She opened her mouth, gasping as the thick white substance filled her mouth and throat immediately.
"Stupid haze."
— ah beng (@ahbengpls) February 15, 2015
"Take it off," he demanded imperiously. It was way past August, but the national flag was still hanging from her balcony.#SG50ShadesOfGrey
— Alvin Pang (@alfpang) February 12, 2015