Social media websites such as Facebook, Twitter, Instagram, are some of the most popular online platforms that people use to share their opinions and content online. With millions of posts and tweets worldwide, there is an incredible amount of information out there.  Twitter for instance, is used by users and companies to to post status updates and to advertise their products. By text mining these messages, it is possible to analyze the behavior and sentiments to predict trends, events or even stock markets.


In this post, I used R to do simple sentiment analysis on the twitter #UCSB hastag. I downloaded the twitter feeds using the twitterR library and used the list of positive words from the data set provided by Hu and Liu, KDD-2004 to analyze the positive words used in each message. After I did some data cleaning and removed the punctuation, letter, space, and links, I created a histogram to look at the most commonly used words in each message.

#UCSB Twitter

The graph shows that the most frequently occurring positive word is the word good and it is followed by the word best with a frequency of 363. The histogram only uses words which are listed in the Hu and Liu, KDD-2004 positive sentiment list.  But it doesn't tell much about the general trend, just only list of positive words that people use to describe UCSB. In order to look at some other words which might not be on that I list, I generated a word cloud.



According to the wordcloud, the most frequently used word is UCSB, which is to be expected since we used the UCSB hashtag. Some of the other words which appear frequently are gauchos, ever, highest, bestcolleges,ranked,just,usnews and ucsantabarbara. This is mostly likely due to the fact that US, News and World ranked University of California Santa Barbara as number 8 among the top best public universities.
UCSB students are proud of their school achieving such as high ranking and are tweeting about how it is one of the best public universities there is. One particular word of interest to me is mamasaiah because I have never heard of that word before. After some googling I found out that it is a twitter username. The reason it appeared to frequently was to the fact that user tweeted about UCSB being the top 3 UC university and it was re-tweeted 519 times.


Going forward, you could do sentiment analysis over a period of time and see if there is a trend. For instance, the way that people feel at the start of the school year might be different from the way that they feel during finals. In addition, you could could build a Naives Bayes classifier which can be used to predict the different emotions, such as happy, sad, anxious, or excited, that the user might feel.


The code I used to generate this wordcloud can be found here: