Entity-Specific Sentiment Classification of Yahoo News Comments

Entity-Specific Sentiment Classification of Yahoo News Comments

Prakhar Biyani, Cornelia Caragea and Narayan Bhamidipati
Yahoo Labs, Sunnyvale, California, USA
Computer Science and Engineering, University of North Texas, Denton, Texas, USA
Email: pxb5080@yahoo-inc.com.com, ccaragea@unt.edu, narayanb@yahoo-inc.com

Sentiment classification is widely used for product reviews and in online social media such as forums, Twitter, and blogs. However, the problem of classifying the sentiment of user comments on news sites has not been addressed yet. News sites cover a wide range of domains including politics, sports, technology, and entertainment, in contrast to other online social sites such as forums and review sites, which are specific to a particular domain. A user associated with a news site is likely to post comments on diverse topics (e.g., politics, smartphones, and sports) or diverse entities (e.g., Obama, iPhone, or Google). Classifying the sentiment of users tied to various entities may help obtain a holistic view of their personality, which could be useful in applications such as online advertising, content personalization, and political campaign planning. In this paper, we formulate the problem of entity-specific sentiment classification of comments posted on news articles in Yahoo News and propose novel features that are specific to news comments. Experimental results show that our models outperform state-of-the-art baselines.


Online news aggregator sites such as Yahoo News are a place for users to get in touch with developments across various domains. In addition to reading news articles, users post comments giving their opinions/sentiments about the topics or entities discussed in the news articles, while interacting (agreeing or disagreeing) with other users. This has resulted in vast amounts of User Generated Content in the form of user comments. An interesting characteristic of news sites is that they cover a wide range of domains such as politics, sports, technology, and entertainment, in contrast to other online social sites, including forums (e.g., UbuntuForums and TripAdvisor) and review sites (e.g., dpreview.com for digital cameras and notebookreview.com for laptops), which are specific to a particular domain. Hence, the activity of a user in terms of posting comments is potentially much more diverse in news sites as compared to other social platforms.

Although it is not uncommon for users to make general comments/statements on various topics or to comment on unrelated entities that they like or dislike, in many cases, comments on a news article contain the sentiments of users tied to specific entities in the article (e.g., Obama or Android). Classifying the sentiments of a particular user on diverse entities may help obtain a holistic view of their personality111In adherence to Yahoo’s privacy policy, all user activity is anonymized and the actual user’s identity is unknown to us.. For example, the sentiments of a user’s comments on news articles tied to specific entities related to politics, smartphones and online retail may help infer her political orientation, preference for a particular mobile operating system (Android vs. iOS) and liking of a particular online retailer (Walmart vs. Target). User sentiments across articles on an entity (e.g., iPhone) can also be followed to determine how sentiments evolve or change over time, and what factors can cause the sentiment change. Analyzing the sentiment of these user comments can help understand the user better which, in turn, can be used to provide greater personalization and improve serving targeted ads to those users.

However, despite the evidence of strong value in analyzing the sentiment of users tied to specific entities, there have not been any reported works on this problem. The problem of identifying the sentiment polarity of these comments remains inherently difficult due to several main challenges, including irrelevant entities and implicit sentiment.

Irrelevant entities: Comments often have entities that are not important with respect to sentiment analysis. Let us consider the following example:

Example 1: Great! Foxnews poll: Obama +9; CNN poll: Obama +7; Reuters/Ipsos poll: Obama +9. I feel a landslide in the making. Gobama! Gobama! Gobama!

In this example, the commenter has a positive sentiment for Obama and no sentiment for entities Foxnews, CNN, Reuters and Ipsos, which are irrelevant for sentiment analysis. Unlike other domains such as product reviews where the sentiment is expressed towards a precisely defined target (i.e., a product or its features), known beforehand, in our domain, the set of entities is not known a priori and covers a wide range of entities, with many of them being irrelevant. In the example above, a traditional sentiment classifier would possibly identify the sentiment for Foxnews as positive due to its close proximity with the sentiment clue “Great!”, leading to inaccurate results.

Implicit sentiment: Users often express sentiments implicitly in their comments by using ironies, analogies and rhetoric, making it hard to detect the sentiment towards entities [González-Ibáñez, Muresan, and Wacholder2011, Utsumi2000]. Let us consider the following examples:

Example 2: I’ve heard that Hillary Clinton modeled herself after Nurse Ratched.

Example 3: Who on earth would even buy Facebook stock?

The first example has a negative sentiment about Hillary Clinton expressed through the analogy with “Nurse Ratched”, who is a negative fictional character. The second example is a rhetorical question expressing a negative sentiment about Facebook. Typical sentiment classification approaches would label these examples as neutral due to the lack of sentiment clues [Ding, Liu, and Yu2008, Qiu et al.2011, Zhang et al.2011, Meng et al.2012].

Against this background, one question that can be raised is: Can we design techniques to effectively identify and filter out irrelevant entities in news comments and further perform accurate sentiment classification of entities for which a sentiment is expressed? The research that we describe in this paper addresses specifically this question.

Contributions. We address the problem of entity-specific sentiment analysis. More precisely, we formulate the problem as a two-stage binary classification. First, we identify entities that are relevant with respect to sentiment analysis, while filtering out irrelevant entities. Second, we classify the sentiment expressed towards relevant entities as positive or negative. Although there are several works on analyzing sentiments of news articles, the current problem is significantly different (as detailed in Section 2). To the best of our knowledge, there are no reported works on this problem. The contributions of our work are as follows:

1. We propose an approach for context extraction of entities discussed in news comments and show that it substantially improves sentiment classification.

2. We design novel features for both classification tasks above. Specifically, we design: (1) non-lexical features for identifying relevant entities and show that these features are more informative than the lexicon-based features and the “bag-of-words” used in previous works on subjectivity analysis; (2) comment-specific features for sentiment classification of entities in comments.

3. We show experimentally that our sentiment classifiers trained using the proposed features extracted from the entity-specific contexts outperform several state-of-the-art approaches to sentiment classification.

Related Work

Sentiment analysis (SA) is widely researched due to its important applications in mining, analyzing and summarizing user opinions in online product reviews [Hu and Liu2004, Ly et al.2011, Ding, Liu, and Yu2008]. Here, we review some of the relevant sentiment analysis works.

Entity-independent SA (EISA): EISA deals with identifying sentiment of a text without linking the sentiment to an entity for which it is expressed. EISA is mainly researched in the domain of product reviews, where a review is assumed to contain sentiments about a particular product and, hence, the linking is not required [Pang, Lee, and Vaithyanathan2002, Pang and Lee2004, McDonald et al.2007, Wan2009, Li et al.2012]. Pang et al. [Pang, Lee, and Vaithyanathan2002] used supervised machine learning algorithms trained on lexical and syntactic features such as unigrams, bigrams and POS tags, for sentiment analysis of movie reviews. In their later work, they improve the sentiment classification by considering only the subjective sentences and applying polarity classifiers (developed in their previous work) on those sentences [Pang and Lee2004]. Wan et al. [Wan2009] use co-training for sentiment classification of Chinese product reviews. They use machine translation to obtain the training data from labeled English reviews. For a Chinese review, its Chinese features and the translated English features represent the two independent views that are used in co-training.

Entity-dependent SA (EDSA): EDSA, on the other hand, links sentiment to its target entity [Ding, Liu, and Yu2008, Nasukawa and Yi2003, Engonopoulos et al.2011, Zhang et al.2011, Meng et al.2012]. Ding et al. \shortciteding2008holistic performed EDSA on product reviews using a lexicon-based approach. For an entity, they calculated its sentiment score by adding sentiment orientation of opinion words co-occurring with the entity in a sentence. Meng et al. \shortcitemeng2012entity used a similar approach for sentiment classification of tweets and determine sentiment orientation by aggregating sentiments of opinion words. In contrast, we use supervised learning models built using several newly designed features in addition to lexicon-based features. The lexicon-based approach is one of our baselines.

SA in News Sites: There are several works on sentiment classification of news articles [Godbole, Srinivasaiah, and Skiena2007, Devitt and Ahmad2007]. However, sentiment classification of news comments is a much more difficult task compared to that of news articles since, unlike news articles, news comments are short, noisy, incoherent, and comprise of very informal writing styles. We found a few works focusing on news comments for analyzing their quality of discourse [Diakopoulos and Naaman2011] and diversifying them for presenting a comprehensive view of news articles to the readers [Giannopoulos et al.2012]. However, these works are different from ours in nature.

Problem Characterization

Sentiment classification in online social sites faces many challenges such as dealing with unstructured text and noisy user input, and mapping sentiment to objects or entities [Liu2011]. Beyond these, sentiment classification of news comments brings additional challenges, i.e., a variety of domains (e.g., politics, sports, and entertainment), lack of use of important sentiment clues (e.g., no use of emoticons), and the use of rhetorical questions. These additional, less studied challenges give rise to the unique design of our model.

The main tasks of sentiment classification of news comments are: (1) extracting entities from news comments, and (2) identifying users’ sentiments about the extracted entities. Although both tasks have their own particular challenges, the second task is central to our study. To extract entities from news comments, we use the Stanford Named Entity Recognizer (SNER). SNER typically identifies three types of entities: person, place, and organization. More precisely, our problem can be formulated as follows.

Problem Formulation: Given a comment and an entity, classify the sentiment expressed in the comment about that entity as: positive, negative or neutral/irrelevant.

To address this problem, we decompose it into two parts. First, we link the target entity with its sentiment context. Specifically, when multiple entities are present in a comment, each entity must be linked to its own context, i.e., the words/phrases in the comment that are related to the entity. This is necessary since entities in a comment may have different sentiments or some entities may not have any sentiment at all associated with them (as illustrated below).

Example 5: In Ohio, voting for Romney who said he would let GM and Chrysler go bankrupt is like paying a guy to rebuild your house that he burned down.

Here, the sentiment is negative for Romney. However, GM and Chrysler do not have any sentiment.

Second, after entities are linked to their contexts, we identify the sentiment for an entity to be positive, negative or neutral, based on the sentiment of its context.

Extracting the Context of an Entity

The context of an entity contains the words, phrases or sentences that refer to the entity. We use several heuristics to extract the contexts. Following are the three main modules of our context extraction algorithm:

1. Preprocessing, where the number of entities in a comment is checked. For single entity comments, the entire comment is taken as the context for the entity. If a comment contains multiple entities, it is segmented into sentences and is given as input to the anaphora resolution module.

2. Anaphora Resolution: We use a rule based approach to anaphora resolution. We check the type of entity: PERSON (P) vs. NON-PERSON (NP) and assign sentences to the context of the entity if they have explicit mentions of that entity or compatible anaphoric references. For example, pronouns such as he, she, her, him can only be used to refer to a P entity, whereas they, their, them can be used to refer to both P and NP entities and it can only be used for NP entities. If a sentence does not have references to any entity, then it is added to the context of all the entities. Also, if a sentence has explicit mentions of multiple entities, then it is given as input to the local context extraction module.

3. Local Context Extraction: If entities occur in clauses that are connected with “but” (in the sentence), then the respective clauses are returned as local contexts for the entities. If the sentence contains a comparison between entities, then it is split at the comparative term (adjective or adverb), with the comparative term added to the left part, and the two parts are returned as local contexts for the respective entities. If none of the two conditions is satisfied, then a window of tokens around entities is taken as their local context.

Identifying the Sentiment of Contexts

After obtaining the contexts of entities, we classify their sentiment into positive, negative or neutral sentiment classes. We model the task of identifying sentiment as two step classification. In the first step, we classify the context of an entity into polar versus neutral sentiment classes. Next, we classify the polar entities into positive or negative sentiment classes. Next, we describe the features used in our classification models and our reasoning behind using them.

Neutral vs. Polar Classification

As already discussed, comments posted on news sites contain entities that are irrelevant with respect to sentiment analysis (see Example 1 in Section Introduction). These entities have no sentiment associated with them and are filtered out before conducting sentiment classification of comments. We address this problem by classifying entities as polar vs. neutral. Irrelevant entities are classified as neutral. Generally, content features and lexicon features form the basis of polar vs. neutral classification. However, in our data, we find some other interesting properties (specific to entities) that can be very helpful in identifying neutral and polar entities. For example, an entity that is a subject or direct object (of the subject) in a comment is more likely to be polar than an entity that is a prepositional object. Also, an entity of the type person is more likely to be polar than an entity that is of non-person type. Let us consider the following examples:

Example 9: Bush didn’t blame anyone for trashing the White House, the 2001 recession, or for the 3 major attacks on America.

Example 10: Obama stole 716 billion dollars we paid into medicare.

In Example 9, Bush is the subject, White House is the direct object and America is the prepositional object. In Example 10, Obama is the subject, Medicare is the prepositional object. As we see, Obama and Bush are polar, whereas America, White House and Medicare are neutral.

Based on this reasoning, we extract the following features for all entities in a comment:

IsPerson: If the entity is of person type (1 if yes, 0 otherwise). To compute this feature, we look at the entity type output by SNER.

IsSubjObj: If the entity is the subject, direct object, prepositional object or none of the three. (3 if subject, 2 if direct object, 1 if prepositional object, 0 otherwise). To compute this feature, we check if the entity has the following dependencies in the dependency tree: nsubj and nsubjpass (nominal subject and nominal subjective passive resp.), dobj (direct object) and pobj (prepositional object).

HasClues: If there are any polarity clues in the context of the entity, as detailed in Section Positive vs. Negative Classification (1 if yes, 0 otherwise).

SentiPos: This feature is calculated from the positive sentiment score given by the SentiStrength algorithm [Thelwall, Buckley, and Paltoglou2012] (0 if the score is 1, 1 otherwise) (we explain the scores output by SentiStrength in the following section).

SentiNeg: This feature is calculated from the negative score given by the SentiStrength algorithm [Thelwall, Buckley, and Paltoglou2012] (0 if the score is -1, -1 otherwise).

Positive vs. Negative Classification

After obtaining the polar entities, we classify the sentiment about those entities into positive or negative sentiment classes. We use the following features for the positive-negative classification.

(a) Polarity Clues: Polarity clues are the words, phrases, or symbols used to express polarity of opinions/emotions. They have been used extensively in sentiment analysis [Hu and Liu2004, Turney2002]. We use the subjectivity lexicon from MPQA corpus developed by Wiebe et al. [Stoyanov, Cardie, and Wiebe2005] to get the polarity clues. The lexicon contains positive clues, negative clues and neutral subjectivity clues. We extract three features NumPos, NumNeg, and PosVsNeg from the context of an entity. NumPos and NumNeg are the number of positive and negative polarity clues in the context, respectively. PosVsNeg is the number of positive divided by the number of negative polarity clues, i.e., (NumPos+1)/(NumNeg+1).

The following rules are used to count the polarity clues:

Rule 1: Negation: If a polarity clue is connected to a negation word (i.e., they co-occur in a window of tokens), we reverse its polarity. If a neutral subjectivity clue is connected to a negation word, then its polarity is taken as negative. For example, believe is a subjectivity clue with prior polarity neutral, but if used with a negation (e.g. I do not believe or I cannot believe) expresses negative sentiment. We use a list of negation words.

Rule 2: Quotes: Users often put polarity clues in quotes or in a quoted phrase to mean entirely opposite sentiment as compared to the sentiment expressed by the clue. If a polarity clue is in a quoted phrase then we reverse its polarity. Let us consider this example.

Example 11: The Republican party also faces a steep climb with the “sane people” demographic.

Here, the clue sane is in a quoted phrase sane people. The prior polarity of sane is positive. However, here, it is used to express negative sentiment about the Republican party.

Rule 3: “but” rule: Usually, sentiment expressed in clauses connected with “but” have opposite polarities. We take into account this property, while aggregating polarity clues for the entities. If clauses containing two entities are connected with “but” and there are explicit polarity clues in the context of only one of the entities, then we increase the count of the clue of opposite polarity for the other entity.

Example 12: Read how Bush tried to control the financial situation with new regulations, but democrats blocked him. Democrats are pathetic, greedy liars.

Here, Bush and Democrats occur in clauses connected with “but” and have opposite sentiment. For democrats, there are explicit negative clues (pathetic, greedy) but we do not have explicit polarity clues for Bush. In this case, we take the value of feature for Bush as .

Rule 4: Comparatives: If two entities are present in a comparative clause and one of the entities does not have an explicit polarity clue (in its context), then for that entity we increment the number of the opposite polarity clue. We identify two most common types of comparatives: adjectival comparatives and adverbial comparatives. We look for JJR and RBR part-of-speech tags between entities to identify comparative adjectives and comparative adverbs, respectively. Let us consider this example:

Example 13: The samsung galaxys’ are way better than all the mobile products apple puts out.

Here, Apple has a negative sentiment but does not have any explicit polarity clue in its context. Using the rule, we take the value of feature for Apple as .

(b) Punctuation Marks: It is a common practice in online social media to use punctuation marks to express sentiments. We look for the presence of two punctuation marks: question and exclamation marks in the context of an entity. We calculate two punctuation features for a context: IsQuestion (presence or absence of a question mark), IsExclam (presence or absence of an exclamation mark).

(c) Sentiment Strength: These features capture the strength of the sentiments expressed in comments. We used the SentiStrength algorithm [Thelwall, Buckley, and Paltoglou2012] to compute these features. The algorithm is specifically designed to calculate sentiment strength of short informal texts in online social media. For a piece of text, the algorithm computes two integral scores, one in the range of +1 (neutral) to +5 (highly positive) that is expressive of the positive sentiment strength of the text and another in the range -1 (neutral) to -5 (highly negative) for negative sentiment strength. A score of +1 and -1 for a text means that the text is neutral or has no sentiment. Using SentiStrength, we compute three features: PosStrength (positive sentiment score), NegStrength (negative sentiment score) and PosVsNegStrength (PosStrength divided by NegStrength).

(d) Comment-specific features: These features capture clues that are specific to news comments. Users often use rhetoric in their comments to express a negative sentiment about an entity. They begin their comments by writing rhetorical questions and/or asking rhetorical questions about entities. Rhetorical questions are those that are not asked for the purpose of obtaining answers or information, but rather to make a point effectively222http://en.wikipedia.org/wiki/Rhetorical_question. Examples of rhetorical questions are: Where is my vote?, Can’t you do anything right? Let us consider the following examples:

Example 14: PLANS? What Plans? Obama has no plans for his second term.

Example 15: So now the Associated Press has to correct their own corrections?

These examples express implicit negative sentiment about Obama and Associated Press, without using explicit negative polarity clues. To capture these rhetorics, we design two binary features: IsFirstQues and IsEnQues. IsFirstQues checks whether the first sentence in the context of an entity is question or not. IsEnQues checks if an entity is present in a question sentence. To identify question sentences, we check for the presence of 5W1H words and question marks.


Data Description

Since there is no annotated dataset for sentiment classification of online news comments, we prepared our own dataset. We randomly sampled comments for annotation that satisfied certain constraints to ensure quality and diversity of the dataset. We, first, marked all the comments with the entities present in them and ranked the entities according to their comment frequencies. From the ranked list, we selected entities to consider. These entities covered areas such as politics (e.g., Obama, Romney), software (e.g., Google, Microsoft), online retail (e.g., Walmart, Ebay), hardware (e.g., Samsung, Apple), and insurance (e.g., Medicare, Obamacare), among others. The entities were selected based on their popularity as well as their relevance from the point of view of user targeting. Figure 1 shows a “word cloud” of the entities. The larger the entity, the more frequent it is in the news comments. As we see, Walmart has a much smaller comment frequency compared with other entities such as Barack Obama, however, it is important due to its commercial nature and relevance to ad targeting.

Figure 1: Word cloud of the entities used in sampling.

We sampled comments such that all the comments have at least one of the entities and all the entities have approximately equal number of sampled comments. We then marked the three most important entities in each comment, obtaining instances. For a comment, the number of instances is equal to the number of entities marked for that comment. Each instance was annotated by two annotators. For each instance, the annotators were asked to identify sentiments expressed in that instance (as negative, neutral or positive). The agreement between the annotators was . For the remaining , a third annotator was asked to select between the two annotations of the two original annotators. Given this annotation scheme, we obtained negative instances, positive instances and neutral instances. Also, comments are neutral, i.e., all the entities present in them have neutral sentiment, comments contain polar as well as neutral entities and comments have only polar entities. We call the comments that have both polar and neutral entities as pseudo-polar comments.

Experimental Setting

We conducted sentiment classification experiments using various supervised machine learning algorithms implemented in the Weka data mining toolkit [Hall et al.2009]. For neutral-polar classification, Logistic Regression gave the best performance, whereas for the positive-negative classification, Naive Bayes outperformed other supervised methods. To evaluate the performance of our classifiers, we report precision, recall and F-1 score, all macro averaged across folds in a cross validation setting.

For neutral-polar classification, we use neutral and pseudo-polar comments. After segmenting comments into contexts of entities, we obtain a total of instances ( neutral and polar) from comments. As explained in Section Problem Characterization, an instance for classification is a context of an entity present in a comment. Since a comment may have multiple entities and, hence, multiple contexts, we can obtain more instances than the total number of comments. For positive-negative classification, we use polar and pseudo-polar comments (a total of comments). Neutral entities from the pseudo-polar comments are not considered in positive-negative classification.

Baselines: We compare our sentiment classifiers with the following three baselines:

1. Bag-of-words and POS tags [Jiang et al.2011, Pang, Lee, and Vaithyanathan2002, McDonald et al.2007]: We use the words in the context of an entity and the part-of-speech tags of those words as features for classification and experiment with two settings: 1) BoW, in which only word frequencies are used as features, 2) BoW+POS, in which both word frequencies and their POS tags are used as features. We use Multinomial Naïve Bayes for these models.

2. SentiStrength: SentiStrength is a state-of-the-art tool for sentiment analysis of short informal texts posted on online social media. We use the following two settings for turning SentiStrength into a sentiment classifier:

  1. SentiStrength scores as features: We use the two scores (positive and negative) output by SentiStrength as features for sentiment classification.

  2. SentiStrength scores as rules: We use the two scores directly as rules for making an inference about the sentiment of a context. For neutral-polar classification, a score of +1 and -1 implies that the text is neutral, and polar otherwise. For the positive-negative classification, a context is positive if its positive sentiment score is greater than its negative sentiment score and similarly for inferring a negative sentiment. For example, a score of +3 and -2 implies positive polarity and a score of +2 and -3 implies negative polarity. If both scores are equal for a context, we randomly assign the context to one class or the other.

3. LexiconRuleBased [Ding, Liu, and Yu2008, Meng et al.2012]: We compute the sentiment for an entity in a comment by calculating the following score:


where is a sentence in , is a polarity word in , is the polarity lexicon, is the sentiment orientation of (1 if positive, -1 if negative) and is the distance between the polarity word and in . The denominator down-weights the sentiment orientation of polarity words that are far from the entity. The sentiment is positive if the score is greater than zero, negative if the score is less than zero and neutral otherwise. For positive-negative classification, if we obtain a zero score, we assign the entity randomly to the positive or the negative class.

4. Naive context extraction: To evaluate our context extraction algorithm, we compare it against a simple method for extracting entity contexts. For this method, we extract entity contexts using a simple scheme. We add the entire sentence to the contexts of all the entities present in it. If a sentence does not contain any entity, then we add it to the context of all the entities in the comment. All other classification settings (features and classifiers) remain same.

Model Pr. Re. F-1
SentiStrength (rule)
SentiStrength (features)
Proposed modelIsPerson
Proposed modelIsSubjObj
Proposed model HasClues
Proposed modelSentiStrnth
Proposed model
Table 1: Neutral-polar Classification results.

Classification Results

Neutral-polar Classification

Table 1 shows the results of neutral-polar classification. The first five rows show the results of the baseline models, whereas the subsequent four rows show the results of models built by removing only one feature at a time from the proposed model. The last row shows the result of the proposed model. As can be seen from the table, the proposed model outperforms all the baselines and using all the features gives the best performance with F-1 score of . We see that the LexiconRuleBased method and SentiStrength (rule) are the worst performing models with F-1 scores of and , respectively, followed by SentiStrength (features) with an F-1 score of . This can be attributed to the fact that SentiStrength is trained on online social media data, which is significantly different from comments data. For example, one of the features used by SentiStrength for detecting sentiment is the presence of emoticons, which are generally not present in news comments. Similarly, we see that the BoW model performs the third worst with an F-1 score of . Adding part-of-speech tags to BoW improves the performance to an F-1 score of . Note that BoWs generally perform better in other sentiment classification tasks in domains such as Twitter [Jiang et al.2011] and product reviews [Pang, Lee, and Vaithyanathan2002, McDonald et al.2007] compared with BoWs in our domain. A possible reason could be the presence of implicit sentiment in the form of rhetorical questions, sarcasm, etc., where users do not use explicit sentiment words, and hence, there are less patterns of words and common POS tags that are generally used to express sentiment and subjectivity (e.g., adjectives, adverbs and common nouns) to learn the models. Also, we see that our method outperforms NaiveContextExtraction.

Next, we discuss the impact of removing different features from our model. We see that removing the IsPerson feature decreases the F-1 score to and when IsSubjObj is removed the performance drops to . The removal of HasClues and SentiStrength features (sentiPos and sentiNeg) has a similar impact on the performance (however, not as big as the removal of IsPerson or IsSubjObj), resulting in an F-1 score equal to , in both cases. We see that removing IsPerson feature has the highest negative impact on the performance, followed by IsSubjObj feature and HasClues and SentiStrength features. This observation is consistent with the feature ranking using Information Gain (IG) [Yang and Pedersen1996] as output by Weka. The following is the IG-based feature ranking: IsPerson IsSubjObj SentiNeg HasClues SentiPos. The features on the right side of have higher rank than those on the left side of . We see that the two proposed non-lexical features, IsPerson and IsSubjObj, are more informative than HasClues and SentiStrength features that are based on lexical properties of comments. This suggests that in comments, entity type (person or non-person) and its grammatical role in the comment (subject, direct object or prepositional object) are highly informative clues/features for polarity.

Model Pr. Re. F-1
lexiconRuleBased 0.227 0.432 0.298
Propsed modelCSF
Proposed model
Table 2: Positive-negative classification results.

Positive-negative Classification

Table 2 shows the results of positive-negative classification experiments. The first five rows present the results of the four baseline classification models, whereas the next two rows show the results of the proposed model without and with comment-specific features, denoted by Proposed modelCSF and Proposed model, respectively. As can be seen, the LexicalRuleBased method is the worst performing model in this setting with an F-1 score of , followed by SentiStrength (rule), BoW and SentiStrength (features) with F-1 scores of , and , respectively. POS tags improve the F-1 score of BoW model from to . The proposed model outperforms all the baselines, having an F-1 score of . To see the effect of comment-specific features on the positive-negative classification, we experimented with the proposed model without the comment-specific features. We see that adding comment-specific features improves the F-1 score of the model from to .

To analyze the importance of different features, we ranked them using Information Gain [Yang and Pedersen1996] and obtained the following feature ranking: NumNeg PosVsNeg NegStrnth IsQuesMark IsEnQues PosStrnth IsQuesFirst IsExclaim NumPos PosVsNegStrnth. We see that features related to positive sentiment (PosStrnth and NumPos) are ranked lower than NumNeg and NegStrnth features. One potential reason for this is that users generally express negative sentiments more explicitly than positive sentiments, and hence, the presence of significantly more negative patterns to learn as compared to the positive ones.

Conclusion and Future Work

In this paper, we studied the problem of identifying users’ sentiments towards individual entities referenced in comments on news articles. We identified several challenges to this problem and proposed solutions to address them. In particular, we designed an algorithm to extract the context of entities in comments, proposed novel non-lexical features for neutral-polar classification, and comment specific features for polarity classification. Our methods outperformed strong baselines for sentiment classification. Interesting directions for future work include: (1) using priors on users based on their comments on (particular or all) entities, e.g., a user could be pessimistic or cynical towards all entities; (2) training mixture of specialized classifiers for the domains covered by a news site, e.g., political, sports, technology, and entertainment. We believe generally people become more sarcastic when they discuss politics.


  • [Devitt and Ahmad2007] Devitt, A., and Ahmad, K. 2007. Sentiment polarity identification in financial news: A cohesion-based approach. In ACL.
  • [Diakopoulos and Naaman2011] Diakopoulos, N., and Naaman, M. 2011. Towards quality discourse in online news comments. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, CSCW ’11, 133–142. New York, NY, USA: ACM.
  • [Ding, Liu, and Yu2008] Ding, X.; Liu, B.; and Yu, P. S. 2008. A holistic lexicon-based approach to opinion mining. In Proceedings of the international conference on Web search and web data mining, 231–240. ACM.
  • [Engonopoulos et al.2011] Engonopoulos, N.; Lazaridou, A.; Paliouras, G.; and Chandrinos, K. 2011. Els: a word-level method for entity-level sentiment analysis. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics,  12. ACM.
  • [Giannopoulos et al.2012] Giannopoulos, G.; Weber, I.; Jaimes, A.; and Sellis, T. 2012. Diversifying user comments on news articles. In Wang, X.; Cruz, I.; Delis, A.; and Huang, G., eds., Web Information Systems Engineering - WISE 2012, volume 7651 of Lecture Notes in Computer Science. Springer Berlin Heidelberg. 100–113.
  • [Godbole, Srinivasaiah, and Skiena2007] Godbole, N.; Srinivasaiah, M.; and Skiena, S. 2007. Large-scale sentiment analysis for news and blogs. ICWSM 7.
  • [González-Ibáñez, Muresan, and Wacholder2011] González-Ibáñez, R.; Muresan, S.; and Wacholder, N. 2011. Identifying sarcasm in twitter: A closer look. In ACL (Short Papers), 581–586. Citeseer.
  • [Hall et al.2009] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; and Witten, I. 2009. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18.
  • [Hu and Liu2004] Hu, M., and Liu, B. 2004. Mining and summarizing customer reviews. In SIGKDD ’04, KDD ’04, 168–177.
  • [Jiang et al.2011] Jiang, L.; Yu, M.; Zhou, M.; Liu, X.; and Zhao, T. 2011. Target-dependent twitter sentiment classification. In HLT ’11, volume 1, 151–160.
  • [Li et al.2012] Li, S.; Ju, S.; Zhou, G.; and Li, X. 2012. Active learning for imbalanced sentiment classification. In EMNLP-CoNLL ’12.
  • [Liu2011] Liu, B. 2011. Opinion mining and sentiment analysis. Web Data Mining 459–526.
  • [Ly et al.2011] Ly, D.; Sugiyama, K.; Lin, Z.; and Kan, M. 2011. Product review summarization from a deeper perspective. In JCDL, 311–314.
  • [McDonald et al.2007] McDonald, R.; Hannan, K.; Neylon, T.; Wells, M.; and Reynar, J. 2007. Structured models for fine-to-coarse sentiment analysis. In ACL ’07, volume 45, 432.
  • [Meng et al.2012] Meng, X.; Wei, F.; Liu, X.; Zhou, M.; Li, S.; and Wang, H. 2012. Entity-centric topic-oriented opinion summarization in twitter. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 379–387. ACM.
  • [Nasukawa and Yi2003] Nasukawa, T., and Yi, J. 2003. Sentiment analysis: capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture, K-CAP ’03, 70–77. New York, NY, USA: ACM.
  • [Pang and Lee2004] Pang, B., and Lee, L. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In ACL ’04, 271.
  • [Pang, Lee, and Vaithyanathan2002] Pang, B.; Lee, L.; and Vaithyanathan, S. 2002. Thumbs up?: sentiment classification using machine learning techniques. In ACL-02, EMNLP ’02, 79–86.
  • [Qiu et al.2011] Qiu, B.; Zhao, K.; Mitra, P.; Wu, D.; Caragea, C.; Yen, J.; Greer, G.; and Portier, K. 2011. Get online support, feel better – sentiment analysis and dynamics in an online cancer survivor community. In Socialcom, 274–281.
  • [Stoyanov, Cardie, and Wiebe2005] Stoyanov, V.; Cardie, C.; and Wiebe, J. 2005. Multi-perspective question answering using the opqa corpus. In HLT-EMNLP ’05, HLT ’05, 923–930. Stroudsburg, PA, USA: ACL.
  • [Thelwall, Buckley, and Paltoglou2012] Thelwall, M.; Buckley, K.; and Paltoglou, G. 2012. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology.
  • [Turney2002] Turney, P. D. 2002. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In ACL ’02, ACL ’02, 417–424.
  • [Utsumi2000] Utsumi, A. 2000. Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony. Journal of Pragmatics 32(12):1777–1806.
  • [Wan2009] Wan, X. 2009. Co-training for cross-lingual sentiment classification. In ACL ’09, 235–243.
  • [Yang and Pedersen1996] Yang, Y., and Pedersen, J. O. 1996. Feature selection in statistical learning of text categorization. Center for Machine Translation, Carnegie Mellon University.
  • [Zhang et al.2011] Zhang, L.; Ghosh, R.; Dekhil, M.; Hsu, M.; and Liu, B. 2011. Combining lexicon-based and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011 89.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description