Skip to content
Parminder Jeet Kaur edited this page Apr 20, 2015 · 3 revisions

###Welcome to the TweetSenseApplication wiki!

##Abstract

Twitter and other microblogging has become so popular among people that everyday millions of posts get shared expressing individual’s opinions, life-events and experiences. Such data storage is considered to be a good source for sentiment analysis which could be helpful in number of ways. This project presents an application which collects the tweets related to any given keyword and outputs by summarizing the public mood with respect to countries/nation.

##Goal The goal of the project was to develop an application that can tell the mood of writers or bloggers on certain subject of interest. For the project, we chose “Twitter” as a source of information because it is quite popular these days. The aim is to aid users to perform market analysis, sales prediction, election prediction, and study public responses on current issues.

##Approach

High Level diagram explaining steps followed for building application

Figure 4. High level diagram explaining steps followed for building application

In order to achieve the project goals, following steps are taken (as shown in Figure 1):

Gather tweets. We use tweets from twitter as a source of information. Therefore, we need to gather filtered tweets first. In our case, we are using a keyword to filter out the tweets. This keyword could be any topic of interest. We are saving streaming tweets in text files.

Load files to analysis. For performing analysis, we load saved files (as discussed in previous step) to the Microsoft Azure storage which is accessed by HDInsight. The ability of accessing Azure Blob Store by HDInsight allows to persist data even after a cluster gets down and also enables to make data available across multiple clusters from persistent storage. The Azure Blob stores are also geo-replicated for redundancy.

Process each tweet for sentiment analysis. We process each tweet for sentiment analysis by running Map Reduce job. For this, we use Microsoft HDInsight service that deploys and provisions Apache Hadoop clusters. The Azure service is managed by Azure management portal. To access the Azure management portal, we used the Azure student subscription which comes with some restrictions to use Azure services.

Export output for data visualisation. We use data visualisation for visual representation of output data for analysis.