Leveraging Multi-grained Sentiment Lexicon Information for Neural Sequence Models

Leveraging Multi-grained Sentiment Lexicon Information for Neural Sequence Models

Yan Zeng, Yangyang Lan, Yazhou Hao, Chen Li, Qinghua Zheng
MOEKLINNS Lab, Department of Computer Science and Technology
Xi’an Jiaotong University, China
zengyan_97@outlook.com, {lyy14011305, yazhouhao}@gmail.com
{cli, qhzheng}@xjtu.edu.cn
Abstract

Neural sequence models have achieved great success in sentence-level sentiment classification. However, some models are exceptionally complex or based on expensive features. Some other models recognize the value of existed linguistic resource but utilize it insufficiently. This paper proposes a novel and general method to incorporate lexicon information, including sentiment lexicons(+/-), negation words and intensifiers. Words are annotated in fine-grained and coarse-grained labels. The proposed method first encodes the fine-grained labels into sentiment embedding and concatenates it with word embedding. Second, the coarse-grained labels are utilized to enhance the attention mechanism to give large weight on sentiment-related words. Experimental results show that our method can increase classification accuracy for neural sequence models on both SST-5 and MR dataset. Specifically, the enhanced Bi-LSTM model can even compare with a Tree-LSTM which uses expensive phrase-level annotations. Further analysis shows that in most cases the lexicon resource can offer the right annotations. Besides, the proposed method is capable of overcoming the effect from inevitably wrong annotations.

Leveraging Multi-grained Sentiment Lexicon Information for Neural Sequence Models


Yan Zeng, Yangyang Lan, Yazhou Hao, Chen Li, Qinghua Zheng MOEKLINNS Lab, Department of Computer Science and Technology Xi’an Jiaotong University, China zengyan_97@outlook.com, {lyy14011305, yazhouhao}@gmail.com {cli, qhzheng}@xjtu.edu.cn

1 Introduction

Sentiment classification as a classic task of natural language processing has received much attention in recent years. This task aims to classify text into positive or negative, or more fine-grained classes such as very negative, negative, neural, etc. In this field, a lot of work has been done, including traditional dictionary based methods [Peter D. Turney, 2002], and early machine learning based methods [Pang et al., 2002], and recently neural network based methods such as convolutional neural network (CNN) ([Yoon Kim, 2014]; [Kalchbrenner et al., 2014]; [Lei et al., 2015]), recurrent neural network (RNN) ([Tomas Mikolov, 2010]; [Chung et al., 2014]; [Tai et al., 2015]), lexicon enhanced methods ([Mikolov et al., 2016]; [Qian et al., 2017]), attention based methods ([Wang et al., 2016]; [Wu et at., 2018]) and some others.

However, many models are based on expensive features, which are not practical to real-world applications. Besides, some models recognize the value of existed linguistic resource but utilize it insufficiently. A common method is simply to encode sentiment label, usually positive and negative labels, into embedding and concatenate it with the word embedding. However, except positive and negative words, negation words (e.g., no, not) and intensifiers (e.g., very, too) are also helpful for sentiment classification.

Therefore, we propose a novel and general method to incorporate lexicon information, including sentiment lexicons(+/-), negation words and intensifiers. Words are annotated in fine-grained and coarse-grained labels. The proposed method first encodes the fine-grained labels into sentiment embedding and concatenates it with word embedding. Second, the coarse-grained labels are utilized to enhance the attention mechanism to give large weight on sentiment-related words.

Annotation Negation words Positive words Negative words Intensifiers Other words
fine-grained NGW POS NEG INT OTH
coarse-grained in not in
Table 1: Two granularity word annotation utilized by the proposed method.

To summarize, the main contributions of our work are as follows:

  • We collected a sentiment lexicon which contains 2759 positive words, 5111 negative words, 35 negation words and 62 intensifiers. The resource is now released at GitHub111https://github.com/zengyan-97/Sentiment-Lexicon to promote further research.

  • We propose a novel and general method to incorporate lexicon information. Experimental results show that our method can increase classification accuracy for neural sequence models on both SST-5 and MR dataset. Specifically, the enhanced Bi-LSTM model can even compare with a Tree-LSTM which uses expensive phrase-level annotations.

2 Related Work

With the development of neural networks, many classical models based on neural networks have been applied into sentiment classification recently, which include Recursive Neural Network ([Socher et al., 2011]; [Socher et al., 2013]), Convolutional Neural Network ([Yoon Kim, 2014]; [Kalchbrenner et al., 2014]; [Lei et al., 2015]), Recurrent Neural Network ([Tomas Mikolov, 2010]; [Chung et al., 2014]; [Tai et al., 2015]; [Zhu et al., 2015];), Attention based methods ([Wang et al., 2016]; [Wu et at., 2018]) and so on. [Wang et al., 2016] introduces a attention-based method to embed aspect information for aspect-level sentiment classification which enlightens us to embed linguistic resource for sentiment classification. Besides, Our model also refers to the attention mode in [Wu et at., 2018].

A lot of work that attempts to utilize linguistic knowledge for sentiment classification has been done. Relevant work can be seen in ([Taboada et al., 2011]; [Mohammad et al., 2013]; [Zhu et al., 2014]; [Mikolov et al., 2016];). Applying sentiment lexicon, negation words and intensifiers in one model to sentiment classification can be seen in [Qian et al., 2017] that introduces three linguistic regularizers on intermediate outputs with KL divergence. Our work differs in that [Qian et al., 2017] applies linguistic regualrizers, we propose two different granularity word annotation based on existing linguistic resources, and based on this, embed the prior knowledge into the model for sentiment analysis.

Figure 1: Model Structure.

3 Methodology

We propose two different granularity word annotation based on existing linguistic resources. One is coarse-grained, which divides words into “in” linguistic resources and “not in” linguistic resources. The other is fine-grained, which divides words into five categories, including “not in” linguistic resources, positive, negative, negation words, intensifiers. To the best of our knowledge, this is the first time that two different granularity word annotations are introduced for sentiment classification in order to incorporate linguistic resources. Then we adopt following two methods to capture the supervised information.

3.1 Sentiment Embedding

The pre-trained word vectors, each of which is trained using its context, only contain faint sentiment information. To solve this problem, we propose to learn five kinds of hidden sentiment property embedding using the fine-grained annotations. Then we concatenate the sentiment property embedding to the pre-trained word vector to get a new word embedding for each word. The intuition of doing so is to add explicit ”sentiment property” to word vectors.

3.2 Enhanced Attention Mechanism

The motivation to introduce the attention mechanism is that we hope the model can focus more on the hidden states whose input is a sentiment word. To improve its ability, we incorporate the lexicon knowledge into a standard attention mechanism. We adopt a new set of annotations here which only classifies a word to be ”in” or ”not in” any lexicon in this part.

Formally, the final hidden state is a weighted sum of all hidden states as:

(1)

where is the hidden state of the t-th word in a sentence, is the attention weight of and measures the importance of the t-th word for sentiment classification, and is the length of the sentence. The attention weight for each hidden state can be defined as:

(2)
(3)

where is a weight vector and represents its transpose, and are weight matrices, is generated by the second set of annotation. Specifically, if a word is in a lexicon, its corresponding is a certain vector which is learned by train. If a word isn’t in any lexicon, we set its corresponding to zeros.

Figure 1 represents the architecture of a standard Bi-LSTM model using our method to encode the three kinds of lexicon information.

4 Experiment

4.1 Sentiment Lexicon

We collect negative and positive words from Subjectivity Lexicon [Wilson et al., 2005] and Opinion lexicon [Hu and Liu, 2004] which contains 5111 negative words and 2759 positive words in total. As for negation words and intensifiers, we collect them manually and we finally get 35 negation words and 62 intensifiers, some of which are shown in Table 2.

Negation words nobody, nowhere, neither,
not, seldom, scarcely,
but, no, none, nothing,
neither, barely, no one
Intensifiers real, absolutely, particularly,
completely, most, unusually,
totally, utterly, especially,
too, very, quite, most, really
Table 2: Examples of Negation words and Intensifiers

4.2 Dataset

Two datasets are used for evaluating the proposed method: Stanford Sentiment Treebank (SST) [Socher et al., 2013] where each sentence is annotated with five classes as very negative, negative, neutral, positive and very positive and Movie Review (MR) [Pang and Lee, 2005] with two classes as negative and positive. Though SST provides phrase-level annotation on all words, we don’t use that since one of our goals is to avoid expensive phrase-level annotation.

Dataset N V Avg P
MR 10662 10279 3.789 22.87%
SST-2 11286 10695 3.385 23.27%
Table 3: The basic statistics of Dataset. N: Number of samples in the dataset. : Number of words in the dataset. AVG: Average number of sentiment words in each sample. P: Average proportion of sentiment words against sentence length

4.3 Experiment settings

In our experiments, all word vectors are initialized by GloVe [Pennington et al., 2014]. The model is trained with a batch size of 25 samples and AdaGrad [Duchi et al., 2011] with a learning rate of 0.1. For regularization, we only employ dropout [Srivastava et al., 2014] on the penultimate with a probability of 0.5. Parameters are initialized by Xavier [Glorot et al., 2012]. Pretrained word Vectors are static in the train.

5 Results and Discussion

Results of our model against other methods are listed in table 4. Our model beats the state-of-art on both SST-5 and MR dataset and can even compare with Tree-LSTM which uses expensive phrase-level annotation.

Model MR SST-5-S SST-5-P
Bi-LSTM 79.3 46.5 49.1
Tree-LSTM 80.7 48.1 51.0
CNN-Static 81.0 45.5 -
LR-Bi-LSTM 82.1 48.6 50.6
Bi-LSTM+ATTN 81.0 - 49.8
Bi-LSTM+2 80.2 47.7 -
Bi-LSTM+5 81.7 48.8 -
Bi-LSTM+2+5 82.8 50.4 -
Table 4: Comparison with baselines. Bi-LSTM: [Cho et al., 2014], Tree-LSTM: [Tai et al., 2015], CNN-Static: [Yoon Kim, 2014], LR-Bi-LSTM: [Qian et al., 2017], Bi-LSTM+ATTN: [Zhou et al., 2016]. ATTN: attention, 2: coarse-grained annotations, 5: fine-grained annotations. Best performances are in bold

Figure 2: Sample 1-3

To further evaluate the performance of our model, we give some samples to analyze the advantages and the limits of our model in different ways.

All Test Set Our Model Failed Percent
2125 76 3.58%
Table 5: when our model failed to capture the extra information: the lexicons gave appropriate annotations but the prediction is wrong.

Figure 3: Sample 4-7

Sample 1-3 shows the lexicons can provide correct annotations which can be used to guide model learning. Actually, what we have found is that most annotations are appropriate. On the other hand, we count the cases when the our model failed to capture the extra information, i.e. the lexicons gave appropriate annotations but the prediction is wrong, by several rules and the result is the models fails 76 in all 2125 samples.

In the cases such as Sample 4 and 5, the lexicons don’t provide more information because of their limited size, but our model still works well. Since we incorporate the annotation constraint in a soft way and when the annotations are wrong the model can still utilize the semantic information from pre-trained word embedding.

However, in the cases like Sample 6 and 7, it can be seen that the word embedding gradually failed to work when the lexicons gave useless or even seriously wrong annotations. However, we can avoid this error simply by update lexicon resource on size and quality.

As for the attention part in our method, it didn’t work as what we expected. In most cases, the attention scores among hidden states have slight differences. Besides, common words tend to have higher scores. However, the Bi-LSTM+2+5 model did beat the Bi-LSTM+5 model in our experiments.

6 Conclusion

The analysis results show the lexicon resource provides useful extra information. Since our model can capture the additional supervised information, it beats the state-of-art models. Additionally, we think collecting a high-quality set of lexicons is necessary. If the lexicons could be larger and the quality could be better, we believe the classification accuracy will get considerable improvement and the lexicon resource can be reused.

Acknowledgments

This work has been supported by National Natural Science Foundation of China (Grant NO: 61772409); The National Key Research and Development Program of China (No. 2018YFC0910404); Ministry of Education-Research Foundation of China Mobile Communication Corp(MCM20160404); “The Fundamental Theory and Applications of Big Data with Knowledge Engineering” under the National Key Research and Development Program of China with grant number 2016YFB1000903; Project of China Knowledge Centre for Engineering Science and Technology; Innovation team of Ministry of Education (IRT_17R86); Innovative Research Group of the Nation Natural Science Foundation of China (61721002).

References

  • [Pang et al., 2002] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing
  • [Peter D. Turney, 2002] Peter D. Turney 2002 Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
  • [Wang et al., 2016] Yequan Wang, Minlie Huang, Li Zhao, Xiaoyan Zhu 2016 Attention-based LSTM for Aspect-level Sentiment Classification. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
  • [Qian et al., 2017] Qiao Qian, Minlie Huang, Jinhao Lei, Xiaoyan Zhu 2017. A Statistical Parsing Framework for Sentiment Classification. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics
  • [Yoon Kim, 2014] Yoon Kim 2014. Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
  • [Kalchbrenner et al., 2014] Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom 2014. A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
  • [Lei et al., 2015] Tao Lei, Regina Barzilay, Tommi Jaakkola 2015. Molding cnns for text: non-linear, non-consecutive convolutions. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
  • [Hu and Liu, 2004] M. Hu, B. Liu. 2004. Mining and Summarizing Customer Reviews. In Proceedings of ACM SIGKDD 2004.
  • [Chung et al., 2014] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. https://arxiv.org/pdf/1412.3555.pdf
  • [Tai et al., 2015] Kaisheng Tai, Richard Socher, Christopher D Manning 2015. Improved semantic representations from tree-structured long short-term memory networks. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
  • [Zhu et al., 2014] Zhu, Xiaodan and Guo, Hongyu and Mohammad, Saif and Kiritchenko, Svetlana 2014. An empirical study on the effect of negation words on sentiment. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
  • [Zhu et al., 2015] Zhu, Xiaodan and Sobhani, Parinaz and Guo, Hongyu 2015 Long short-term memory over recursive structures. Proceedings of the 32nd International Conference on Machine Learning
  • [Socher et al., 2011] Richard Socher, Jeffrey Pennington, Eric H Huang, Andrew Y Ng, Christopher D Manning 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
  • [Socher et al., 2013] Socher, Richard and Perelygin, Alex and Wu, Jean Y and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts Christopher 2013. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
  • [Wang et al., 2016] Wang, Xingyou and Jiang, Weijie and Luo, Zhiyong 2016. Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics
  • [Mohammad et al., 2013] Mohammad, Saif and Kiritchenko, Svetlana and Zhu, Xiaodan 2013. Nrc-canada: Building the state-of-theart in sentiment analysis of tweets. Seventh International Workshop on Semantic Evaluation.
  • [Mikolov et al., 2016] Teng, Zhiyang and Vo, Duy-Tin and Zhang, Yue 2016. Context-sensitive lexicon features for neural sentiment analysis. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
  • [Tomas Mikolov, 2010] Mikolov, Tomas 2012. Statistical language models based on neural networks.
  • [Taboada et al., 2011] Taboada, Maite and Brooke, Julian and Tofiloski, Milan and Voll Kim-berly and Stede, Manfred 2011. Lexicon-based methods for sentiment analysis.
  • [Socher et al., 2013] Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher and Ng, Andrew and Potts, Christopher 2013. Parsing With Compositional Vector Grammars EMNLP
  • [Pang and Lee, 2005] B. Pang, L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales In Proceedings of ACL 2005.
  • [Wilson et al., 2005] Wilson, Theresa and Wiebe, Janyce and Hoffmann, Paul 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis EMNLP.
  • [Pennington et al., 2014] Pennington, Jeffrey and Socher, Richard and Manning, Christopher D 2014. Glove: Global vectors for word representation EMNLP.
  • [Duchi et al., 2011] Duchi, John and Hazan, Elad and Singer, Yoram 2011. Adaptive subgradient methods for online learning and stochastic optimization The Journal of Machine Learning Research
  • [Srivastava et al., 2014] Srivastava, Nitish and Hinton, Geoffrey and Krizhevsky, Alex and Sutskever Ilya and Salakhutdinov, Ruslan 2014. Dropout: a simple way to prevent neural networks from overfitting The Journal of Machine Learning Research
  • [Glorot et al., 2012] Glorot, Xavier and Bengio, Yoshua Understanding the difficulty of training deep feedforward neural networks
  • [Cho et al., 2014] Cho, Kyunghyun and Merrienboer, Bart Van and Gulcehre, Caglar and Bahdanau, Dzmitry and Bougares, Fethi and Schwenk, Holger and Bengio, Yoshua 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
  • [Shin et al., 2017] Shin, Bonggun and Lee, Timothy and Choi, Jinho D. 2017. Lexicon Integrated CNN Models with Attention for Sentiment Analysis Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
  • [Wu et at., 2018] Wu, Zhen and Dai, XinYu and Yin, Cunyan and Huang, Shujian and Chen, Jiajun 2018. Imporving Review Representations with User Attention and Product Attention for Sentiment Classification Association for the Advancement of Artificial Intelligence
  • [Zhou et al., 2016] Zhou, Peng and Qi, Zhenyu and Zheng, Suncong and Xu, Jiaming and Bao, Hongyun and Xu, Bo 2016. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
Add comment
Cancel
Loading ...
322872
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description