Attentional Encoder Network for Targeted Sentiment Classification

Attentional Encoder Network for Targeted Sentiment Classification

Youwei Song, Jiahai Wang , Tao Jiang, Zhiyue Liu, Yanghui Rao
School of Data and Computer Science
Sun Yat-sen University
Guangzhou, China
{songyw5,jiangt59,liuzhy93}@mail2.sysu.edu.cn
wangjiah,raoyangh@mail.sysu.edu.cn
  The corresponding author.
Abstract

Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words using recurrent neural networks such as LSTM in conjunction with attention mechanisms. However, LSTM networks are difficult to parallelize because of their sequential nature. Moreover, since full backpropagation over the sequence requires large amounts of memory, essentially every implementation of backpropagation through time is the truncated version, which brings difficulty in remembering long-term patterns. To address these issues, this paper propose an Attentional Encoder Network (AEN) for targeted sentiment classification. Contrary to previous LSTM based works, AEN eschews complex recurrent neural networks and employs attention based encoders for the modeling between context and target, which can excavate the rich introspective and interactive semantic information from the word embeddings without considering the distance between words. This paper also raise the label unreliability issue and introduce label smoothing regularization term to the loss function for encouraging the model to be less confident with the training labels. Experimental results on three benchmark datasets demonstrate that our model achieves comparable or superior performances with a lightweight model size. 111Codes and datasets are available at https://1drv.ms/u/s!AgbDC2VhgmvxaWGVvR9AnaVpizg.

Attentional Encoder Network for Targeted Sentiment Classification


Youwei Song, Jiahai Wang thanks:   The corresponding author., Tao Jiang, Zhiyue Liu, Yanghui Rao School of Data and Computer Science Sun Yat-sen University Guangzhou, China {songyw5,jiangt59,liuzhy93}@mail2.sysu.edu.cn wangjiah,raoyangh@mail.sysu.edu.cn

1 Introduction

Targeted sentiment classification is a fine-grained sentiment analysis task, which aims at determining the sentiment polarities (e.g., negative, neutral, or positive) of a sentence over “opinion targets” that explicitly appear in the sentence. For example, given a sentence “I hated their service, but their food was great”, the sentiment polarities for the target “service” and “food” are negative and positive respectively. A target is usually an entity or an entity aspect. In this paper, we use aspect and target interchangeably.

In recent years, neural network models are designed to automatically learn useful low-dimensional representations from targets and contexts and obtain promising results  (Dong et al., 2014; Vo and Zhang, 2015; Tang et al., 2016a). However, these neural network models are still in infancy to deal with the fine-grained targeted sentiment classification task. Attention mechanism (Bahdanau et al., 2014) is also incorporated to model the impact of target words to enforce the model to pay more attention to context words with closer semantic relations with the target. Wang et al. Wang et al. (2016) concatenates target embeddings with word representations and let targets participate in computing attention weights. Ma et al. Ma et al. (2017) learns the attended representation between context and target words interactively. Chen et al. Chen et al. (2017) adopts multiple-attention mechanism on the memory built with bidirectional LSTM and nonlinearly combine the attention results with gated recurrent units.

These attention based studies have realized the importance of targets and use the attention mechanism to accurately model context by generating target-specific representations. However, these dominant targeted sentiment classification studies depend on complex recurrent neural networks (RNNs) as sequence encoder to compute hidden semantics of texts.

The first problem with previous work is that the modeling of text relies on RNNs. RNNs are very expressive, but they have some inherent shortcomings. Firstly, RNNs are memory intensive models, since the backpropagation through time (BPTT) requires large amounts of memory and computation, essentially every training algorithm of a recurrent model is the truncated BPTT (Werbos, 1990), which affects the model’s ability to capture dependencies over longer time scales. Secondly, the theoretical unlimited memory advantage offered by RNNs is mostly absent in practice (Bai et al., 2018). Thirdly, RNNs are hard to parallelize at training because of their sequential nature. RNNs have firmly established as state-of-the-art approaches in almost all natural language processing tasks. Now some changes have taken place, models like the autoregressive Wavenet (Van Den Oord et al., 2016) or the Transformer (Vaswani et al., 2017) are replacing RNNs on a diverse set of tasks. In the previous targeted sentiment classification studies, attention mechanism is a critical factor for their success, but it is only used in later stages to learn the importance of computed hidden states by RNNs. Contrary to previews RNN based works, this paper eschews recurrence and employs attention mechanism as a competitive alternative to draw the hidden states the semantic interactions between target and context words.

Another problem that previous studies ignore is the label unreliability issue, since neutral sentiment is a fuzzy sentimental state which brings difficulty for model learning. We add a label smoothing regularization term to the loss function which is an effective strategy for encouraging the model to be less confident. As far as we know, we are the first to raise the label unreliability issue in the targeted sentiment classification task.

Experimental results on three benchmark datasets show that the proposed model achieves comparable or superior performances and is a lightweight alternative of the best RNN based models.

The main contributions of this work are presented as follows:

  1. We propose a novel approach for targeted sentiment classification which eschews complex recurrent neural networks and employs attention mechanism as an alternative to draw the hidden states and semantic interactions between target and context words.

  2. We raise the label unreliability issue and add an effective label smoothing regularization term to the loss function for encouraging the model to be less confident with the training labels.

  3. We evaluate the model sizes of the compared models and show the lightweight of the proposed model.

2 Proposed Methodology

This section describes the detailed implementation of the proposed Attentional Encoder Network (AEN) for targeted sentiment classification. Figure 1 illustrates the overall architecture of the AEN model.

Figure 1: Overall architecture of the proposed AEN.

2.1 Input Embedding Layer

The input to this model includes a context sequence and a target sequence , where is a sub-sequence of . Each word stands for the index of the word in the vocabulary and is the vocabulary size. Let to be the pre-trained word embeddings lookup matrix, where is the dimension of word vectors. The word vectors are pre-trained by GloVe (Pennington et al., 2014) on massive web datasets. Then we map each word to its corresponding embedding representation , which is a column in the embedding matrix . Then we have the context embedding and the target embedding .

2.2 Attentional Encoder Layer

The attentional encoder layer is a parallelizable alternative of LSTM and is applied to compute the hidden states of the input embeddings, which transforms the context embedding to the introspective context representation and transforms the target embedding to the context-perceptive target representation .

This layer consists of two submodules: the Multi-Head Attention (MHA) and the Point-wise Convolution Transformation (PCT).

2.2.1 Multi-Head Attention

In AEN, we use an intra-attention for introspective context modeling and an inter-attention for context-perceptive target modeling. The implementation details of this two parts are consistent, but the parameters are not shared.

An attention function can be described as mapping a key sequence and a query sequence to an output sequence . Following Vaswani et al. (Vaswani et al., 2017), the attention mechanism in AEN is Multi-Head Attention (MHA) which can learn n_head different scores in parallel child spaces and is very powerful for alignments. The outputs are concatenated and projected to the specified hidden dimension , namely,

(1)
(2)

where “” denotes vector concatenation, , is the output of the -th head attention and .

The -th output token in is the result of the -th query token , which is a weighted sum of the values from . Formally,

(3)

where the weight assigned to each token is the normalized attention score:

(4)
(5)

The attention score denotes the learned semantic relevance between and , which is calculated as follows:

(6)

where are learnable weights.

Intra-MHA, also known as multi-head self-attention, is a special situation for typical attention mechanism that . This particular form of attention mechanism can connect distant words via shorter network paths than RNNs, which improves the ability of long-range dependencies modeling and can capture the internal structure of the sentence.

We use intra-MHA for introspective context modeling. Given a context embedding , we can get the introspective context representation by:

(7)

With respect to the context embedding itself, intra-MHA can learn the semantic connections between every two words in the sentence no matter how distant they are, so that the learned context representation is aware of long-term dependencies.

Inter-MHA is the generally used form of attention mechanism that is different from . Given a context embedding and a target embedding , we can get the context-perceptive target representation by:

(8)

After this interactive procedure, each given target word will have a composed representation selected from context embeddings . Then we get the target-specific context representation .

2.2.2 Point-wise Convolution Transformation

A Point-wise Convolution Transformation (PCT) consists of two one-dimensional convolutions with a nonlinear activation in between, can transform contextual information gathered by the MHA, is applied to each position separately and identically.

Point-wise means that the kernel sizes are 1 and the same transformation is applied to every single token vector belonging to the input sequence. Formally, given a input sequence , PCT is defined as:

(9)

where stands for the ELU activation, is the convolution operator, and are the learnable weights of the two convolutional kernels, and are biases of the two convolutional kernels.

With the introspective context representation and the target-specific context representation , PCTs are applied to get the output hidden states of the attentional encoder layer and by:

(10)
(11)

Note that the parameters of this two PCTs are not shared.

2.3 Target-specific Attention Layer

After we obtain the introspective context representation and the context-perceptive target representation , we employ another MHA to obtain the target-specific context representation by:

(12)

The multi-head attention function here also has its independent parameters.

2.4 Output Layer

We get the final representations of the previous outputs by average pooling.

(13)
(14)
(15)

We regard the composition of the above three sentiment-resource-specific sentence representations as the final comprehensive representation , and use a full connected layer to project the concatenated vector into the space of the targeted classes.

(16)
(17)
(18)
(19)

where is the predicted sentiment polarity distribution, and are learnable parameters.

2.5 Regularization and Model Training

Since neutral sentiment is a very fuzzy sentimental state, training samples which labeled neutral are unreliable. We employ a Label Smoothing Regularization (LSR) term in the loss function, which is a regularization mechanism for encouraging the model to be less confident (Szegedy et al., 2016). LSR can reduce overfitting by preventing a network from assigning the full probability to each training example during training, replaces the 0 and 1 targets for a classifier with smoothed values like 0.1 or 0.9.

For a training sample with the original ground-truth label distribution , we replace with

(20)

where is the prior distribution over labels , and is the smoothing parameter. In this paper, we set the prior label distribution to be uniform .

LSR is equivalent to the KL divergence between the prior label distribution and the network’s predicted distribution . Formally, LSR term is defined as:

(21)

The objective function (loss function) to be optimized is the cross-entropy loss with and regularization, which is defined as:

(22)

where is the ground truth represented as a one-hot vector, is the predicted sentiment distribution vector given by the output layer, is the coefficient for regularization term, and is the parameter set.

We take the derivative of loss function through backpropagation method to compute the gradients. Adam optimizer (Kingma and Ba, 2014) is applied to update all the parameters, which works well in practice and compares favorably to other stochastic optimization methods.

3 Experiments

3.1 Datasets and Experimental Settings

We conduct experiments on three datasets to validate the effectiveness of our proposed model: SemEval 2014 Task 4 222The detailed introduction of this task can be found at http://alt.qcri.org/semeval2014/task4. (Pontiki et al., 2014) dataset composed of Restaurant reviews and Laptop reviews, and ACL 14 Twitter dataset gathered by (Dong et al., 2014). These datasets are labeled with three sentiment polarities: positive, neutral and negative. Table 1 shows the number of training and test instances in each category.

Word embeddings are initialized by GloVe 333Pre-trained word vectors of GloVe can be obtained at https://github.com/stanfordnlp/GloVe. and do not get updated in the learning process. The dimension of word embeddings and hidden states are set to 300. The weights of our model are initialized with Glorot initialization (Glorot and Bengio, 2010). During training, we set label smoothing parameter to 0.2 (Szegedy et al., 2016), the coefficient of regularization item is and dropout rate is 0.1. We adopt the Accuracy and Macro-F1 metrics to evaluate the performance of the model, which are widely used in previous works. We implement the AEN model and its variants with PyTorch 444https://pytorch.org/. (Paszke et al., 2017) using the same input, embedding size, dropout rate, optimizer, etc.

Dataset Positive Neural Negative
Train Test Train Test Train Test
Twitter 1561 173 3127 346 1560 173
Restaurant 2164 728 637 196 807 196
Laptop 994 341 464 169 870 128
Table 1: Statistics of the datasets.
Models Twitter Restaurant Laptop
Accuracy Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1
Baselines Majority 0.5000 0.3333 0.5350 0.3333 0.6500 0.3333
Feature-based SVM 0.6340 0.6330 0.8016 - 0.7049 -
Rec-NN 0.6630 0.6590 - - - -
TD-LSTM 0.7080 0.6900 0.7563 - 0.6813 -
ATAE-LSTM - - 0.7720 - 0.6870 -
IAN - - 0.7860 - 0.7210 -
MemNet 0.6850 0.6691 0.7816 0.6583 0.7033 0.6409
RAM 0.6936 0.6730 0.8023 0.7080 0.7449 0.7135
Ablated AEN AEN w/o PCT 0.7066 0.6907 0.8017 0.7050 0.7272 0.6750
AEN w/o MHA 0.7124 0.6953 0.7919 0.7028 0.7178 0.6650
AEN w/o LSR 0.7080 0.6920 0.8000 0.7108 0.7288 0.6869
RNN based variant AEN-BiLSTM 0.7210 0.7042 0.7973 0.7037 0.7312 0.6980
AEN 0.7283 0.6981 0.8098 0.7214 0.7351 0.6904
Table 2: Main results. The results of baseline models are retrieved from published papers. Top 2 scores are in bold.

3.2 Baseline Models

In order to comprehensively evaluate the performance of AEN, we list some baseline approaches for comparison. The baselines are introduced as follows.

Majority is a basic baseline method, which assigns the majority sentiment polarity in the training set to each sample in the test set.

Feature-based SVM (Kiritchenko et al., 2014) is a traditional support vector machine based model with extensive feature engineering.

Rec-NN (Dong et al., 2014) firstly uses rules to transform the dependency tree and put the opinion target at the root, and then learns the sentence representation toward target via semantic composition using Recursive NNs.

TD-LSTM (Tang et al., 2016a) extends LSTM by using two LSTM networks to model the left context with target and the right context with target respectively. The left and right target-dependent representations are concatenated for predicting the sentiment polarity of the target.

ATAE-LSTM (Wang et al., 2016) strengthens the effect of target embeddings, which appends the target embeddings with each word embeddings and use LSTM with attention to get the final representation for classification.

IAN (Ma et al., 2017) interactively learns attentions in the contexts and targets, which generates the representations for targets and contexts separately.

MemNet (Tang et al., 2016b) uses multi-hops of attention layers on the context word embeddings for sentence representation to explicitly captures the importance of each context word when inferring the sentiment polarity of a target.

RAM (Chen et al., 2017) strengthens MemNet by representing memory with bidirectional LSTM and using a gated recurrent unit network to combine the multiple attention outputs for sentence representation.

3.3 Main Results

As shown in Table 2, AEN achieves the comparable or superior performances. The Majority method is the worst, meaning the majority sentiment polarity occupies 53.5% and 65.0% of all samples in the Restaurant and Laptop categories respectively. Feature-based SVM is still a competitive baseline, but relying on manually-designed features.

The rest of the baseline models are all neural network based and are better than the Majority method, showing that deep learning has potentials in automatically generating representations and can all bring performance improvement for sentiment classification.

Rec-NN gets the worst performances among all neural network based baselines as dependency parsing is not guaranteed to work well on ungrammatical short texts such as tweets and comments, which may still result in a long path between the opinion word and its target. TD-LSTM obtains a significant improvement over Rec-NN on the Twitter dataset since the target signals are taken into consideration by using LSTM network to compute the left and right contexts with targets. LSTM based models rely on sequential information and perform well by capturing more useful context features.

With the introduction of attention mechanism, these attention based models stably exceed the TD-LSTM method on Restaurant and Laptop datasets. ATAE-LSTM emphasizes the modeling of targets via the addition of the aspect embedding and uses attention mechanism to get the final representation for targeted sentiment classification. IAN interactively learns attentions between the contexts and targets and obtains better results. RAM represents context with bidirectional LSTM and nonlinearly combines the multiple attention outputs with a gated recurrent unit network for sentence representation.

Attention mechanism is a critical factor for their success as it can accurately model context by generating target-specific representations with more relational information. However, these attention based studies are based on complex recurrent neural networks as sequence encoder to compute hidden semantics of texts, and attention mechanisms are only used in later stages to learn the importance of computed hidden states by RNNs.

Although RNNs are very expressive, they have some inherent shortcomings which we have mentioned in the Introduction section. These inherent shortcomings limit the reliability of the model’s ability to capture long-term dependencies and increase the difficulty of parallelizing.

Like AEN, MemNet also eschews sequential neural networks and achieves better results than other baseline models on the Restaurant dataset since it captures the importance of each context word with multi-hop attentions to explicitly captures the importance of each context word. However, MemNet does not model the hidden semantic of embeddings, and the result of the last attention is essentially a linear combination of word embeddings.

The proposed AEN achieves the comparable or superior performances as it takes a further step towards modeling the context with attention mechanism.

3.4 Analysis of Ablated AEN

In order to investigate the effectiveness of each component of AEN, we design three ablated variants (i.e., the second group in Table 2).

AEN w/o PCT is the AEN without PCT module, AEN w/o MHA is the AEN without MHA module, and AEN w/o LSR is the AEN without label smoothing regularization.

We observe that the performances of these ablated AENs are incomparable with those of full AEN in both accuracy and macro-F1 measure. The results show that all of these discarded components are crucial for good performance.

Comparing the results of AEN and AEN w/o LSR, we observe that the accuracy of AEN w/o LSR drops significantly on all three datasets. We could attribute this phenomenon to the unreliability of the training labels since neutral sentiment is a very fuzzy sentimental state, and label smoothing regularization is an effective strategy in the classification task where some labels are unreliable.

3.5 Attention verses Recurrence

To compare the performance of the RNN encoder and the attentional encoder (i.e., the component described in Section 2.2), we design an RNN based variant AEN-BiLSTM, which replaces the attentional encoder layer with the bidirectional LSTM network. The remaining components are consistent with AEN.

As shown in Table 2, the overall performance of AEN and AEN-BiLSTM is relatively close, AEN performs better on the Restaurant dataset.

To figure out whether the proposed AEN is a lightweight alternative of recurrent models, we study the model size of each model on the Restaurant dataset. Statistical results are reported in Table 3.

We implement all the compared models base on the same source code infrastructure, use the same hyperparameters, and run them on the same GPU 555All experiments are conducted on the same NVIDIA GTX 1080ti. . The GPU memory footprints are evaluated when models and word embeddings are loaded to a CUDA device.

Models other than AEN and MemNet are all based on LSTM. We observe that LSTM based models indeed have larger model size and require more memory footprint, which is mainly because of the four gates of LSTM (i.e., input, forget, cell, and output gates).

ATAE-LSTM, IAN, RAM, and AEN-BiLSTM are all attention based recurrent models, so their model sizes are higher than TD-LSTM. Since the encoded hidden states must be kept simultaneously in memory in order to perform attention mechanisms, memory optimization for these models will be more difficult.

MemNet has the lowest model size, since it only has one shared attention layer and two linear layers, and does not calculate hidden states of word embeddings.

AEN’s lightweight level ranks second, since it takes some more parameters in modeling hidden states of sequences, which brings a deeper understanding of the sentence. As a comparison, the model size of AEN-BiLSTM is more than twice that of AEN, but does not bring any performance improvements.

Models Model size
Params Memory (MB)
TD-LSTM 1.44 12.41
ATAE-LSTM 2.53 16.61
IAN 2.16 15.30
RAM 6.13 31.18
MemNet 0.36 7.82
AEN-BiLSTM 3.97 22.52
AEN 1.16 11.04
Table 3: Model sizes. Memory footprints are evaluated on the Restaurant dataset. Lowest 2 are in bold.

4 Related Work

The research approach of the targeted sentiment classification task including traditional machine learning methods and neural networks methods.

Traditional machine learning methods, including rule-based methods (Ding et al., 2008) and statistic-based methods (Jiang et al., 2011), mainly focus on extracting a set of features like sentiment lexicons features and bag-of-words features to train a sentiment classifier (Rao and Ravichandran, 2009; Kaji and Kitsuregawa, 2007; Perez-Rosas et al., 2012; Mohammad et al., 2013). The performance of these methods highly depends on the effectiveness of the feature engineering works, which are labor intensive.

In recent years, neural network methods are getting more and more attention as they do not need handcrafted features and can encode sentences with low-dimensional word vectors where rich semantic information stained.

The most important aspect of the targeted sentiment classification task is how to capture the contextual semantic connections between context words and target words. In order to incorporate target words into a model, (Tang et al., 2016a) propose TD-LSTM to extend LSTM by taking the target into consideration, which uses two single-directional LSTM to model the left context and right context of the target word respectively. (Tang et al., 2016b) design a deep memory network which consists of a multi-hop attention mechanism with an external memory to capture the importance of each context word concerning the given target. Multiple attention is paid to the memory represented by word embeddings to build higher semantic information. (Chen et al., 2017) propose a recurrent attention network which adopts multiple-attention mechanism on the memory built with bidirectional LSTM and nonlinearly combines the attention results with gated recurrent units (GRUs). (Ma et al., 2017) propose an interactive attention network which learns the representations of the target and context with two attention networks interactively.

Contrary to RNN based models, AEN eschews recurrence and employs attention mechanism as a competitive alternative to draw the hidden states the semantic interactions between target and context words.

5 Conclusion

In this work, we propose the attentional encoder network for the targeted sentiment classification task. Contrary to RNN based studies, AEN employs attention based encoders for the modeling between context and target, which can excavate the rich introspective and interactive semantic information from the word embeddings without considering the distance between words. In addition, we raise the label unreliability issue and introduce label smoothing regularization term to the loss function for encouraging the model to be less confident with the training labels. Experimental results and analysis demonstrate the effectiveness and lightweight of the proposed model.

References

  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  • Bai et al. (2018) Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
  • Chen et al. (2017) Peng Chen, Zhongqian Sun, Lidong Bing, and Wei Yang. 2017. Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 452–461.
  • Ding et al. (2008) Xiaowen Ding, Bing Liu, and Philip S Yu. 2008. A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 international conference on web search and data mining, pages 231–240. ACM.
  • Dong et al. (2014) Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming Zhou, and Ke Xu. 2014. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 49–54.
  • Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256.
  • Jiang et al. (2011) Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, and Tiejun Zhao. 2011. Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 151–160. Association for Computational Linguistics.
  • Kaji and Kitsuregawa (2007) Nobuhiro Kaji and Masaru Kitsuregawa. 2007. Building lexicon for sentiment analysis from massive collection of html documents. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 1075–1083.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • Kiritchenko et al. (2014) Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and Saif Mohammad. 2014. Nrc-canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 437–442.
  • Ma et al. (2017) Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. 2017. Interactive attention networks for aspect-level sentiment classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 4068–4074. AAAI Press.
  • Mohammad et al. (2013) Saif M Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242.
  • Paszke et al. (2017) Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. In NIPS-W.
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
  • Perez-Rosas et al. (2012) Veronica Perez-Rosas, Carmen Banea, and Rada Mihalcea. 2012. Learning sentiment lexicons in spanish. In LREC, volume 12, page 73.
  • Pontiki et al. (2014) Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 27–35.
  • Rao and Ravichandran (2009) Delip Rao and Deepak Ravichandran. 2009. Semi-supervised polarity lexicon induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 675–682. Association for Computational Linguistics.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
  • Tang et al. (2016a) Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting Liu. 2016a. Effective lstms for target-dependent sentiment classification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3298–3307.
  • Tang et al. (2016b) Duyu Tang, Bing Qin, and Ting Liu. 2016b. Aspect level sentiment classification with deep memory network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 214–224.
  • Van Den Oord et al. (2016) Aäron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. In SSW, page 125.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.
  • Vo and Zhang (2015) Duy Tin Vo and Yue Zhang. 2015. Target-dependent twitter sentiment classification with rich automatic features. In International Conference on Artificial Intelligence, pages 1347–1353.
  • Wang et al. (2016) Yequan Wang, Minlie Huang, Li Zhao, et al. 2016. Attention-based lstm for aspect-level sentiment classification. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 606–615.
  • Werbos (1990) Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
340829
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description