Sentiment Identification in Code-Mixed Social Media Text
Sentiment analysis is the Natural Language Processing (NLP) task dealing with the detection and classification of sentiments in texts. While some tasks deal with identifying presence of sentiment in text (Subjectivity analysis), other tasks aim at determining the polarity of the text categorizing them as positive, negative and neutral. Whenever there is presence of sentiment in text, it has a source (people, group of people or any entity) and the sentiment is directed towards some entity, object, event or person. Sentiment analysis tasks aim to determine the subject, the target and the polarity or valence of the sentiment. In our work, we try to automatically extract sentiment (positive or negative) from Facebook posts using a machine learning approach. While some works have been done in code-mixed social media data and in sentiment analysis separately, our work is the first attempt (as of now) which aims at performing sentiment analysis of code-mixed social media text. We have used extensive pre-processing to remove noise from raw text. Multilayer Perceptron model has been used to determine the polarity of the sentiment. We have also developed the corpus for this task by manually labelling Facebook posts with their associated sentiments.
Sentiment analysis - of social media in particular - has become a popular area of research in present times. The massive proliferation of social media has been a catalyst in this regard. A culture shift can be noticed where the users comfortably and candidly express their emotions, opinions or sentiments online. This has encouraged the researchers to analyze and study the presence of sentiments from social media.
Extraction of sentiment from social media â like Facebook or microposts like Twitter â can serve a myriad of purposes. These texts often express opinion about a variety of topics. It can be the appraisal of the user about certain products or incidents, the state of mind of the speaker or any intended emotional communication that he may want to have with potential readers. User reviews on e-commerce sites, opinions on web blogs, tweets111twitter.com and Facebook222www.facebook.com posts, can be mined for assessing polarity of opinion. Businesses use the power of text analytics behind their data mining technology. Sentiment analysis helps businesses in advertising, marketing and making business decisions for better customer satisfaction. Organizations can determine public opinion about their products and services. Similarly, consumers can use sentiment analysis while researching products prior to purchase. It can also be used to investigate the web for forecasting electoral results (by evaluating voter sentiment) and track political preferences. Recently, social media analysis has been used extensively to identify cyber-bullying prevalent in the web space .
Although we have come across various tasks conducted on multilingual texts, the task of sentiment analysis, in particular, has not been explored for multilingual code-mixed texts. This type of text differs significantly from traditional English texts and needs to be processed differently. However, different forms of texts require different methods for sentiment analysis. For example, if we look at sentiments in scientific papers, it is hedged and indirect while the sentiments are more direct in movie or product reviews. Traditional texts like reviews and newspaper are structured and follow a definite pattern. Also, the writing is more formal and composed. Social media texts on the other hand are largely informal. They are concise and informal with several linguistic differences.
In our work, we have used code-mixed social media data which have been collected from Facebook post. The text is informal and conversational in accordance with social media characteristics. It is mostly bilingual though the presence of three languages in a single post is not entirely uncommon in our data. Initially, we pre-process the text to normalize the irregular words. We also remove noise from the text prior to processing it and translate the abbreviations to regular words wherever applicable. We label the posts with their respective part-of-speech tags. Traditionally, sentiment classifiers show improvements by using part-of-speech features. We make use of various word-level, dictionary-based and stylistics features relevant to social media text to classify the sentiment as subjective or objective. Subjective posts are further categorized as positive or negative in polarity. We use various machine learning algorithms for our final classification. Artificial neural network model performs best in our experiments.
The remainder of this paper is structured as follows: Section 2 gives an overview of the background and related work. In Section 3, we present the dataset. The working model for our system is described in Section 4. We describe in detail the pre-processing and feature selection used to build the classification models. In Section 5, we present the results obtained using different combinations of features. We evaluate the performance of various machine learning models that we used in our experimentation. Section 6 summarizes the main findings of this work and sketches the lines for future work.
2 Related Work
Research regarding emotion and mood analysis in text â is becoming more common recently, in part due to the availability of new sources of subjective information on the web. The work of  was one of the very first in the area of sentiment classification. They focused on the actual taxonomy and isolation of terms with an emotional connotation.
Identifying the semantic polarity (positive vs. negative connotation) of words has been done using different approaches. Some of the works (knowledge-based) explicitly attempted to find features indicating that subjective language is being used.  made use of corpus statistics,  used linguistic tools such as WordNet , and  used lexicon-based classifier.  work on classification of reviews was based on using an unsupervised learning technique. They found the mutual information between document phrases and the words like âexcellentâ and âpoorâ. The mutual information was computed using statistics gathered by a search engine. In their work on automatic classification of sentiment in online domains,  evaluated the performance of different classifiers on movie reviews. They demonstrated that that standard machine learning techniques outperform human-produced baselines.
Typically, methods for sentiment analysis produce lists of words with polarity values assigned to each of them. This method has been successfully employed for applications such as product review analysis and opinion mining [6, 7, 13, 26, 23, 32, 11].  reported high accuracy in classifying emotions in online chat conversations by using the phonemes extracted from a voice-reconstruction of the conversations.  investigated discriminating terms for emotion detection in short text while  described a system for identifying affect in short fiction stories, using the statistical association level between words in the text and a set of keywords. In another work,  used distant supervision to build the corpus.
There has been some work by researchers in the area of phrase level and sentence level sentiment classification  and on analyzing blog posts .  determined whether an expression is neutral or polar and then disambiguated the polarity of the polar expressions. With this approach, their system was able to automatically identify the contextual polarity for a large subset of sentiment expressions.
Sentiment analysis of social media text has received a lot of interest from the research community in the recent years with the rise to prominence of Facebook and Twitter.  used context-dependent sentiment words in their work and  suggested combining learning-based and lexicon-based techniques using a centroid classifier.  used positive and negative emoticons to classify tweet polarity. They showed that machine learning algorithms (Naive Bayes, Maximum Entropy, and SVM) have accuracy above 80% when trained with emoticon data.  showed how to automatically collect a corpus for sentiment analysis and opinion mining purposes. They concluded that authors use syntactic structures to describe emotions or state facts and some POS-tags may be strong indicators of emotional text. They obtained best results using Naive Bayes classifier that uses N-gram and POS-tags as features.  used crowdsourcing techniques to manually rate polarity in Twitter posts. In their work,  classified human affective states from posts shared on Twitter.  highlighted the suitability of Support Vector Machine or Naive Bayes for different domains. Our approach is similar to that of  who presented the idea of ternary classification system (positive, negative and neutral). They used target words bearing sentiment and supervised learning for classification. We also use some techniques for noise reduction which was inspired by . They proposed building a sophisticated feature space to handle noisy and short messages in their work on Twitter sentiment analysis.
A recent shared task was conducted by Twelfth International Conference on Natural Language Processing (ICON-2015)333http://ltrc.iiit.ac.in/icon2015/contests.php , for part-of-speech tagging of transliterated social media text. For the shared task in that corpus, data was collected from Bengali-English Facebook chat groups. The Facebook posts are in mixed English-Bengali and English-Hindi â and have been obtained from the âJU Confessionâ Facebook group, which contains posts in English-Bengali with few Hindi words in some cases. We have modified the ICON Shared Task Corpora for our work on sentiment analysis. The dataset contains three languages â Bengali, Hindi and English. The data set contains 882 posts in total. The statistics for the dataset have been presented in Table 1.
|Language Tags||Number Of Words Present||Percentage Of Corpus|
The purpose of the implementation is to be able to automatically classify a post as a positive or negative tweet sentiment wise. The classifier needs to be trained and to do that we needed a list of manually classified posts. We used 2 annotators to classify the posts into three categories â positive, negative or neutral.
We have calculated Kappa co-efficient to measure the inter-annotator agreement. Kappa co-efficient is a reliable and robust measure to measure the agreement between two users. It takes into account the agreement occurring by chance and hence, is more useful than percent agreement calculation.
|Annotator 1||Annotator 2|
For the above data, po is 0.641 and pe is 0.3642, therefore giving a Kappa co-efficient of 0.4354. Because the Kappa measure is low, so we have obtained the instances where the annotators are unanimous about the sentiment polarity. There are a total of 565 such instances. We have used these posts for our sentiment polarity classification.
4 System Description
The process of sentiment analysis can be divided into three major parts : pre-processing of raw posts, feature identification and extraction and finally, the classification of sentiment as positive, neutral or negative. The steps have been discussed in sequential order.
4.1 Pre-processing of the Facebook posts
The following steps were performed to pre-process the raw posts prior to feature extraction.
Expansion of Abbreviations
As social media text is often non-traditional and informal in nature, the posts had to be pre-processed initially to remove noise. We have used an abbreviation list to normalize all the words that were abbreviated. For example, btw was replaced by âby the wayâ, clg by âcollegeâ, hw by âhowâ and so on.
Removal of Punctuations
Before processing the post any further, we remove all punctuations from the text. Mostly social media texts contains a lot of punctuations and their usage is often arbitrary in nature, not adhering to grammatical norms. To compound the problem further, punctuations like stop, question mark and exclamation marks are often used multiple times in succession. By removing all the punctuations, we try to make our text as noiseless as possible. We keep a record of the number of different punctuations in the text which has been used as a feature for classification.
Removal of Multiple Character Repetitions
It is often found in social media text that certain characters are repeated more than once. These non-conformational spellings are very hard to deal with as they cannot be successfully matched to any dictionary. For example, lol (abbreviated form of laughing out loud)can be written as loool, looool or loooooool. We use pre-processing in order to reduce all these occurrences to lool. Any character which occurs more than two times in a row is replaced by two occurrences of the same character. Some other examples are ahhhh (reduced to ahh) and uhhhh (reduced to uhh). However, we maintain a record of the number of repetitions as this could be used by the author in specific situations to reflect sentiment.
4.2 Feature Extraction
In our work, we used the following features to train our machine learning model.
Number Of Word Matches With Sentiwordnet (SWN): We have used SentiWordNet444http://sentiwordnet.isti.cnr.it/ as one of the sentiment resources. SWN is a lexical resource for sentiment analysis. It assigns three sentiment scores â positivity, negativity and objectivity to each synset of WordNet. So, a given word can have a positive or negative score or both. We have extracted all the positive and negative words from SWN. The final list contains 17027 positive words and 17992 negative words. For a given data instance or sentence, we find if the normalized words are a match with any words in these two lists. In a sentence we count the number of words which matches with the positive word list and the number of words which matches with the negative word list and the assign the difference between the positive and negative word count as a feature.
Number Of Word Matches With Opinion Lexicon (OL): Similar to SentiWordNet, Opinion Lexicon555https://www.cs.uic.edu/ liub/FBS/sentiment-analysis.html is another lexical resource for sentiment analysis. It contains a list of positive and negative opinion words or sentiment words for English. There is a total of 2006 positive words and 4783 negative words. We find the number of matches to both the lists and the difference is taken as our second feature.
Number Of Word Matches With English Sentiment Words (ESW): We have collected a list of positive and negative words from the internet for sentiment classification. We hand-labeled a few words in the training data as root words which depicted emotion. Using bootstrapping, we expanded this list of words. It contains 3075 positive words and 4003 negative words. This list concentrates more on the words which appear in social media context. Similar to the previous two features, we find the number of matches to the positive and negative lists and the difference between the two is considered as our third feature.
Number Of Word Matches With Bengali Sentiment Words (BSW): This list was developed to tackle the presence of sentiment in Bengali words. As we are dealing with multilingual text, it was essential to develop this list for Bengali. Das and colleagues [3, 4, 5] developed SentiWordNet for Indian Languages. However, this list contained words in Bengali (or Brahmic) scripts. As we are dealing with transliterated text, this wordlist required transliteration to English. Finally, we developed a positive and a negative wordlist for transliterated Bengali words. The number of words in the positive wordlist is 1778 while the negative wordlist contains 3713 words. The difference in number of matches to both the lists is considered as our next feature.
Number Of Colloquial Bengali Sentiment Words (CBW): We have created this list for Bengali words which often appear in social media text. It must be noted that Bengali Sentiment Words developed previously is more formal in nature and therefore, not sufficient for identifying colloquial words which appear in Facebook posts or Twitter texts. For example, words like jata (hopeless), hebby (excellent), phot (get lost) are not captured by Bengali Sentiment Words. We create two lists â positive and negative wordlists - tries to incorporate all such words which may indicate the presence of sentiment in the text. The number of matches to both the lists is determined and the difference is assigned as feature.
Density Of Curse Or Bad Words (CW): We have used a list of curse words (words which are used as bad words in majority instances) developed by  in their work on cyberbullying. In their work, the authors collected 713 curse words (e.g. ‘asshole’, ‘bitch’ etc.) and hieroglyphs (such as ‘5hit’, ‘@ss’ etc.) based on online resources. We have used this list to find out all the words which have been used with a negative sentiment.
Part-Of-Speech Tags (POS): All the posts were tagged manually for parts-of-speech information. It has been noted that words belonging to certain part-of-speech tags (like JJ, RB and JJ-RB) are usually used to express sentiment. These part-of-speech tags can be considered as features to detect presence of sentiment in commonly occurring unigram and bigrams in the training data.
Number Of All Uppercase Words (UW): Based on the findings of , capital letters can represent shouting or strong opinion in online chats and posts. We have identified the number of words in a post which are written in all capital letters. This is used as a feature to detect the presence of emotion or sentiment in online settings.
Density Of Exclamation Points (E): Just like the uppercase letters, exclamation points also stand as emotional comments. To identify strong emotions in social media context, we chose the number of exclamation points as a feature for our model. The number of exclamation points is normalized by the number of words present in the text.
Density Of Question Marks (Q): Similar to the last feature, multiple question marks in the text can denote surprise, excitement or agitation of the user. We chose the number of question marks as our next feature. The number of question marks is normalized by the number of words present in the text.
Number Of Character Repetitions In A Word (R): It is often observed that users tend to repeat a number of characters â vowels or consonants â to stress their opinion in social media conversations. Words like loool, lolzzzz, ufffff, ahaaa, greaaat are quite common in social media texts. While we reduce all such words during our pre-processing step, we have also maintained a record of all such occurrences. These repetitions are often indicative of sentiment and we use it as one of our feature.
Frequency Of Code Switches (CS): As we are dealing with multilingual texts, we have considered the frequency of code switching as one of our features. It is often observed that the writer shifts language to clarify his opinion. We have tried to exploit this social and communication needs for this language shifting to determine the presence of sentiment. This frequency (number of language switching points) is normalized by the number of words in a particular post.
Number Of Smiley Matches (S1 And S2): Smileys are quite prevalent in social media text and often form a primary way of expressing emotion. We have created two resources for identifying smiley in text. The first one contains 269 positive smileys and 170 negative smileys. The second list contains 243 smileys. We found the number of matches to both the lists and used it as a feature.
4.3 Classification of Sentiment Polarity
We obtain results for the 565 posts for which both the annotators agreed on the polarity. We use 70% of the dataset for training and 30% for testing purposes.We split the dataset using 400 posts for training and 165 posts for testing.
We use the machine learning software WEKA666http://www.cs.waikato.ac.nz/ml/weka/downloading.html . We combine the above features to form a feature set and employ a number of machine learning algorithms for classification. The best results were produced by Multilayer Perceptron model. This classifier uses back propagation to classify instances into three categories â positive, negative and neutral. The nodes in this network are all sigmoid. The learning rate and momentum rate for the back propagation algorithm was kept at 0.3 and 0.2 respectively. The number of epochs was set to 500 and the random number generator was seeded using value 0.
Individually, none of the features was able to detect positive or negative instances in citation. This is due to the biasness of the system. We perform feature analysis by removing one feature at a time to determine if any feature is more important than the other. We also check by adding one feature group at a time. The classification confidence score from WEKA and the number of matches to our citation specific lexicon is used to develop a post-processing algorithm.
5 Results and Observations
For feature analysis, we have grouped the different kind of features and obtained the impact of each group in classification. We have grouped the word (or dictionary) based features into Group 1 (G1), syntactic features into Group 2 (G2) and the style based features into Group 3 (G3).
G1: SWN + OL + ESW + BSW + CBW + CW + S
G3: UW + E + Q + R + CS
|Feature added||Correct classifications||Incorrect classifications||Accuracy|
|G1 + G2||113||52||0.685|
|G1 + G2 + G3||101||64||0.612|
From Table 3 it is evident that word based features (Group 1) and syntactic features (Group 2) produce the best results collectively. The accuracy decreases when we include the style based features for classification.
Table 4 serves to highlight the impact of individual features in classification. At each turn, we eliminate one of the features while keeping all the other features. The accuracy suffers the maximum on elimination of POS (JJ, RB and RB_JJ) features and the polar smiley list. Elimination of all the style based features (UW, E, Q, R and CS) shows improvement in accuracy. This is in accordance to our findings in Table 3. Elimination of SWN also improves accuracy. Removing BSW â which comprises of conformational (or traditional) Bengali words â do not affect accuracy proving the fact that social media text requires tailor-made resources.
|Feature Eliminated||Correct classifications||Incorrect classifications||Accuracy|
Table 5 shows the confusion matrix for the polarity classification (using word based and semantic features). The precision, recall and f-measure of the supervised and baseline systems are compared in Table 6.
If we consider the baseline model to contain all the instances of neutral polarity, then we can achieve an accuracy of 55.2%. Our best performing system shows an accuracy of 68.5%. So we can see that our supervised system shows improvement over the baseline model. However, the learning algorithm was slightly biased towards neutral classification which is evident from the confusion matrix. Most of the errors are due to positive and negative citations being identified as neutral.
In future works, we will need to fine tune our classification features so that the system can identify positive and negative citations more efficiently. Also using a larger dataset to train the system would eliminate the bias towards neutral classification of polarity.
6 Conclusion and Future Work
As per our knowledge, there exists no sentiment classifier for code-mixed social media text. We have performed a machine learning based sentiment classification of Facebook posts. The polarity of each post has been classified as positive, negative and neutral. As there has not been any similar work before, we had to create a dataset of our own. Two human annotators classified the polarity of each post. Due to the inherent complexity of social media text, use of arbitrary emoticons and presence of sarcasm, the agreement between the human annotators was quite low with a Kappa co-efficient of 0.4354. Although the entire dataset consists of 882 posts, we have used only 565 posts where the annotators were unanimous about the polarity of underlying sentiment. We used word-based, semantic and style-based features for classification. The best result was obtained using a combination of word-based and semantic features with an accuracy of 68.5%.
As our dataset is relatively small, we would like to create a larger dataset in future. Sentiment annotation can also be done using distant supervision based on the presence of emoticons. However, such an approach can lead to noisy dataset. Creating a gold standard for all future tasks is a priority for us. In this work, we have not focused on detection of sarcasm in text. Also, we have not handled negation in data. We would like to concentrate on dealing with these issues in our next work. Apart from that, sentiment classification can be further improved by better handling comparisons and by detecting sentiment targeted towards an entity in particular. Handling of context switches is also important. Developing a real time accurate sentiment classifier model is the ultimate goal which we strive to achieve in future.
-  Balahur, A.: Sentiment analysis in social media texts. In: 4th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 120–128 (2013)
-  Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Advances in Information Retrieval, pp. 693–696. Springer (2013)
-  Das, A., Bandyopadhyay, S.: Sentiwordnet for indian languages. Asian Federation for Natural Language Processing, China pp. 56–63 (2010)
-  Das, A., Bandyopadhyay, S.: Dr sentiment knows everything! In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations. pp. 50–55. Association for Computational Linguistics (2011)
-  Das, A., Gambäck, B.: Sentimantics: conceptual spaces for lexical sentiment polarity representation with contextuality. In: Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. pp. 38–46. Association for Computational Linguistics (2012)
-  Das, S.R., Chen, M.Y.: Yahoo! for amazon: Sentiment parsing from small talk on the web. For Amazon: Sentiment Parsing from Small Talk on the Web (August 5, 2001). EFA (2001)
-  Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web. pp. 519–528. ACM (2003)
-  De Choudhury, M., Gamon, M., Counts, S.: Happy, nervous or surprised? classification of human affective states in social media. In: ICWSM (2012)
-  Diakopoulos, N.A., Shamma, D.A.: Characterizing debate performance via aggregated twitter sentiment. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 1195–1198. ACM (2010)
-  Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. pp. 231–240. ACM (2008)
-  Esuli, A., Sebastiani, F.: Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of LREC. vol. 6, pp. 417–422. Citeseer (2006)
-  Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 12 (2009)
-  Grefenstette, G., Qu, Y., Shanahan, J.G., Evans, D.A.: Coupling niche browsers and affect analysis for an opinion mining application. In: Coupling approaches, coupling media and coupling languages for information retrieval. pp. 186–194. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE (2004)
-  Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD explorations newsletter 11(1), 10–18 (2009)
-  Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics. pp. 174–181. Association for Computational Linguistics (1997)
-  Holzman, L.E., Pottenger, W.M.: Classification of emotions in internet chat: An application of machine learning using speech phonemes. Retrieved November 27(2011), 50 (2003)
-  Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp. 537–546. ACM (2013)
-  Huang, Q., Singh, V.K., Atrey, P.K.: Cyber bullying detection using social and textual analysis. In: Proceedings of the 3rd International Workshop on Socially-Aware Multimedia. pp. 3–6. ACM (2014)
-  Kamps, J., Marx, M., Mokken, R.J., Rijke, M.d., et al.: Using wordnet to measure semantic orientations of adjectives (2004)
-  Liu, H., Lieberman, H., Selker, T.: A model of textual affect sensing using real-world knowledge. In: Proceedings of the 8th international conference on Intelligent user interfaces. pp. 125–132. ACM (2003)
-  Mishne, G., et al.: Experiments with mood classification in blog posts. In: Proceedings of ACM SIGIR 2005 workshop on stylistic analysis of text for information access. vol. 19, pp. 321–327. Citeseer (2005)
-  Nahar, V., Unankard, S., Li, X., Pang, C.: Sentiment analysis for effective detection of cyber bullying. In: Asia-Pacific Web Conference. pp. 767–774. Springer (2012)
-  Nasukawa, T., Yi, J.: Sentiment analysis: Capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on Knowledge capture. pp. 70–77. ACM (2003)
-  Ortony, A., Clore, G.L., Foss, M.A.: The referential structure of the affective lexicon. Cognitive science 11(3), 341–364 (1987)
-  Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc. vol. 10, pp. 1320–1326 (2010)
-  Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. pp. 79–86. Association for Computational Linguistics (2002)
-  Read, J.: Recognising affect in text using pointwise-mutual information. Unpublished M. Sc. Dissertation, University of Sussex, UK (2004)
-  Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL student research workshop. pp. 43–48. Association for Computational Linguistics (2005)
-  Rubin, V.L., Stanton, J.M., Liddy, E.D.: Discerning emotions in texts. In: The AAAI Symposium on Exploring Attitude and Affect in Text (AAAI-EAAT) (2004)
-  Tan, S., Wang, Y., Cheng, X.: Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. pp. 743–744. ACM (2008)
-  Turney, P., Littman, M.L.: Unsupervised learning of semantic orientation from a hundred-billion-word corpus (2002)
-  Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS) 21(4), 315–346 (2003)
-  Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. pp. 90–94. Association for Computational Linguistics (2012)
-  Wiebe, J.: Learning subjective adjectives from corpora. In: AAAI/IAAI. pp. 735–740 (2000)
-  Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. pp. 347–354. Association for Computational Linguistics (2005)
-  Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for twitter sentiment analysis (2011)