I woke up this morning and looked at some of the twitter trends. One trend that caught my attention was the hashtag SundayFunday. So I thought to myself, what could people be tweeting about today that could be make Sunday fun day.
I decided to find the answer by doing some text analysis in R. I downloaded the tweets using the TwitteR library and converted them into a corpus so using the tm R library. Using the tm_map function I cleaned the tweets by removing all whitespace, all urls, all retweets, punctuation, stopwords and stemmed the tweets.
If you are not familiar with stemming, it is a process by which a word is transformed to it's root word. For example, the word acted would be transformed to act by removing the suffix -ed. This was done so similar words do not appear twice, otherwise they would just be redundant.
After cleaning the tweets, I converted the tweets into unigrams and bigrams using the Weka R interface. Unigrams and bigrams are commonly referred to as n-gram. An n-gram is a continuous sequence of n words. An example of a unigram would be day. An example of bigram would be cool day. Etc.
Then using the bigrams and unigrams I generated some wordclouds. From looking at the unigram wordcloud, it seems to be that people are talking about the inspiring others, about best friends, about their dreams and going to lake thahoe. The bigram wordcloud is a bit different because it shows the commonly used two word phrases. On the bigram wordcloud user are discussing about the beach calilife, the new sega console and beach day. All things that would make any Sunday into a fun day.
The source code for the data analysis can be found here : Github.