Sentiment Analysis: Machine Learning Approach

—Twitter is one of most popular social networking site where people are expressing their views, opinion and emotions liberally. These tweets are recorded and analysed to mine emotions of people related to a terrorist attack (Uri attack). Present study retrieve tweets about Uri attack and find emotions and polarity of tweets. To mine emotions and polarity in tweets, text mining techniques are used. Approximately 5000 tweets are recoded and pre-processed to create a dataset of frequently appearing words. R is used for mining emotions and polarity. Experimental result showed that 94.3% people were disgusted by Uri attack.

A domain specific feature based model for movies review has been developed by [4]. Here aspect based technique is used, which analyzes text movie reviews and assign sentiment label to it on the basis of aspect. Each aspect is then aggregated from multiple reviews to find sentiment score of specific movie. Author uses SentiWordNet based technique for feature extraction and to compute document level sentiment. The result obtained by algorithm is compared with Alchemy API result. The result of comparison shows feature based model result is better than Alchemy API technique. In short aspect wise sentiment result is better than document wise result. [4] A huge collection of near about 300000 corpus tweets for sentiment analysis and opinion mining is collected by [5]. A sentiment classifier model is build which identifies tweets positive, negative or neutral. In this technique, collected corpus was divided into 3 sets namely positive emotions-happiness, amusement or joy; Negative emotions-sadness, anger or disappointment and Neutral-text doesn't contains emotions. Tree Tagger is used for POS-tagging for distribution of emotions.
Consumer marketing data is used for collecting sentiments about product and collected data is used for future prediction. Consumer review data is huge amount of data so author uses hadoop environment for sentiment analysis. Experimental work created hadoop clusters for analysis of data. Tweets were categorized as positive, negative and neutral [6].
Hadoop's FLUME and HIVE tools are also used for analysis of twitter data. FLUME tool extracts data and stores into HDFS form. HIVE tool is used to extract and analyze data from HDFS type storage. HIVE tool is helps in analysis of different topics by changing keywords. Author identifies sentiments and polarity of tweets from election voting data [7].
Twitter data is also automatically classified into positive, negative and neutral according to query term used in consumer review tweets. In the paper author uses Parts Of Speech (POS) polarity technique and tree kernel technique. Research work uses two types of resources such as hand dictionary of emotions and dictionary collected from web. Author used different types of classification and feature extraction algorithm. [8] III. SENTIMENT ANALYSIS Six different sentiments can be analyzed using sentiment package namely anger, disgust, fear, joy, sadness and surprise. By using word cloud frequently occurring words were recorded. A sentiment was added to these frequently occurring words. These new words and sentiments are added to sentiment file for sentiment analysis. Present uses bayes algorithm. Sentiment analysis algorithm compares each word with words in sentiment file and assigns count for each sentiment. Finally it can display count for each sentiment.
Present work also finds polarity of text. Polarity will be positive, negative or neutral. In this experiment new words were identified using word cloud and then polarity was assigned to them. Similar to sentiment analysis, it also compares each word with polarity word file and counts polarity of text file. Lastly it displays count for each polarity.

A. Dataset Creation
Dataset was created from retrieval of Uri Attack tweets. To retrieve tweets a tweeter application was generated to get ConsumerKey, Consumer_Secret, Access_Token and Access_Token_Secret. These keys are used to connect R Studio and tweeter application. Once the connection is done, providing search term as "Uri Attack" a dataset of 5000 tweets was created. This data set was pre-processed to eliminate duplicate tweets so final dataset contained 1788 tweets B. Data Pre-processing The final dataset consisted of raw tweets, which needed pre-processing to get good results. Tweets were processed to remove stop words, frequent usage words such as conjunctions, numbers, prepositions, names, base verbs, etc. These type of words do not play any important role in sentiment analysis. Following are preprocessing steps.
 Filtering-In this step tweets are cleaned by removing inks, some special words, emotions symbols, user names etc.  Tokenization-In this step tweets are separated into different tokens.
 Stopwords Removal-Stop words are nothing but specific common words which have no analytic value are removed from the tweets.  stemDocument-This step is used to remove common word endings such as "ing", "es", "s" etc.
 Remove White Spaces-Each text contains lots of white spaces. In this step white spaces are removes.
 Convert to lower-After removing all unnecessary terms from text, it is converted to lower case.

C. Experimental Work
Experimental work is carried out on Windows XP operating system. Configuration used for working environment is Intel i3, 3.3 GHz core processor with 2 GB RAM. For experimental purpose R studio version 0.99.486 and R version 3.

is used.
R is open source statistical programming language and software environment. It is mostly useful for data manipulation, data analysis, calculations and visualization of result in graphical format. Some important characteristics of R are; it is vast capabilities, wide range of statistical, graphical, data mining, data analysis techniques or functionalities, excellent community support, extensible functionality and lots of help is available. It is mostly useful for educational and research purpose. It is open source and freely available [9].
Present study uses R for sentiment analysis of Uri Attack. R has rich set of built in packages such as tm, sentiment, wordcloudetc [10] are used for sentiment analysis.
A sentiment analysis algorithm is applied on pre-processed dataset of tweets which gave 6 different sentiments with its counts and polarity of each tweet positive, negative or neutral. Table 1 show sentiment score for each emotion whereas table 2 shows polarity count.  IV. OBSERVATIONS Present study classifies tweets into six different emotions namely anger, disgust, fear, joy, sadness and surprise. Also we classified these tweets into three polarities namely positive, negative and neutral. Table 1  shows count of sentiment for different sentiments and table 2 shows polarity count for each type of polarity. Figure 1 and figure 2 shows sentiment analysis about Uri attack according to emotions and polarity.
From table 1, it is observed that most of the people have fear about this event. Fear count was highest among all the emotions which are equal to 55.59%. Table also shows that anger and disgust emotions are little bit similar and are 18.46% and 19.63% respectively. Some tweets are related to sadness and are equal to 0.62%. Table also shows that some people were surprised about Uri attack. But shameful fact is that 4.75% people were in joy about Uri attack. This group of people may be terrorist or terrorist supporter. Overall we say that near about 94.3% people have fear, anger, sadness and disgust emotions, and as 5.7% people were surprised and joyed about Uri attack.
Polarity of tweet's aim is to identify overall conceptual polarity of writer. Table 2 shows polarity of the tweets. From table 2, we observe that 67.17% tweets express negativety, 14.93% express neutral stance and 17.90% were positive.
According to human psychology, any attack or terrorist activity generates panic. Present work accurately classifies people's emotions about Uri attack.
V. CONCLUSION Uri attack was an attack by four terrorist on 18-sep-2016 on Indian military camp located near Uri town in Jammu And Kashmir State. This attack was condemned by the world. People all over the world tweeted on Uri attack. Tweets were extracted using Twitter API. This text data is pre-process by different technique to extract features which were helpful to identifies emotions and polarity of tweets. Present study accurately classifies emotions as per human psychology. It is useful to discover opinion/ sentiments of people when they tweet. It also helps in identifying polarity of tweets. All this is carried out using R studio and its text mining packages.
All over the world many netizens have tweeted on this event but only 5000 tweets were considered for analysis due to restriction of space and processing capability, remaining tweets with their emotions and polarity were not considered. In future big data analysis technique can be used to classify all emotions for large volume of tweets.