Method for aspect-based sentiment annotation using rhetorical analysis
This paper fills a gap in aspect-based sentiment analysis and aims to present a new method for preparing and analysing texts concerning opinion and generating user-friendly descriptive reports in natural language. We present a comprehensive set of techniques derived from Rhetorical Structure Theory and sentiment analysis to extract aspects from textual opinions and then build an abstractive summary of a set of opinions. Moreover, we propose aspect-aspect graphs to evaluate the importance of aspects and to filter out unimportant ones from the summary. Additionally, the paper presents a prototype solution of data flow with interesting and valuable results. The proposed method’s results proved the high accuracy of aspect detection when applied to the gold standard dataset.
Keywords:sentiment analysis, opinion mining, aspect-based sentiment analysis, rhetorical analysis, Rhetorical Structure Theory
Modern society is an information society bombarded from all sides by an increasing number of different pieces of information. The 21st century has brought us the rapid development of media, especially in the internet ecosystem. This change has caused the transfer of many areas of our lives to virtual reality. New forms of communication have been established. Their development has created the need for analysis of related data. Nowadays, unstructured information is available in digital form, but how can we analyse and summarise billions of newly created texts that appear daily on the internet? Natural language analysis techniques, statistics and machine learning have emerged as tools to help us. In recent years, particular attention has focused on sentiment analysis. This area is defined as the study of opinions expressed by people as well as attitudes and emotions about a particular topic, product, event, or person. Sentiment analysis determines the polarisation of the text. It answers the question as to whether a particular text is a positive, negative, or neutral one.
Our goal is to build a comprehensive set of techniques for preparing and analysing texts containing opinions and generating user-friendly descriptive reports in natural language - Figure 1. In this paper, we describe briefly the whole workflow and present a prototype implementation. Currently, existing solutions for sentiment annotation offer mostly analysis on the level of entire documents, and if you go deeper to the level of individual product features, they are only superficial and poorly prepared for the analysis of large volumes of data. This can especially be seen in scientific articles where the analysis is carried out on a few hundred reviews only. It is worth mentioning that this task is extremely problematic because of the huge diversity of languages and the difficulty of building a single solution that can cover all the languages used in the world. Natural language analysis often requires additional pre-processing steps, especially at the stage of preparing the data for analysis, and steps specific for each language. Large differences can be seen in the analysis of the Polish language (a highly inflected language) and English (a grammatically simpler one). We propose a solution that will cover several languages, however in this prototype implementation we focused on English texts only.
In this paper, we present analysis and workflow inspired by the work of Joty, Carenini and Ng . We experimented with several methods in order to validate aspect-based sentiment analysis approaches and in the next steps we want to customise our implementation for the Polish language.
The paper presents in Section 1 an introduction to sentiment analysis and its importance in business, then in Section 2 - related work from rhetorical and sentiment analysis areas is presented. Section 3 covers description of our method. Implementation and the dataset are described in Section 4. Section 5 refers to the results. The last Section 6 consists of conclusions and future work.
2 Related Work
2.1 Rhetorical Analysis
Rhetorical analysis seeks to uncover the coherence structure underneath the text, which has been shown to be beneficial for many Natural Language Processing (NLP) applications including text summarization and compression , machine translation evaluation , sentiment analysis , and others. Different formal theories of discourse analysis have been proposed. Martin  proposed discourse relations based on discourse connectives (e.g., because, but) expressed in the text. Danlos  extended sentence grammar and formalize discourse structure. Rhetorical Structure Theory or RST - used in our experiments - was proposed by Mann and Thompson . The method proposed by them is perhaps the most influential theory of discourse in computational linguistics. Moreover, it was initially intended to be used in text generation tasks, but it became popular for parsing the structure of a text . Rhetorical Structure Theory represents texts by hierarchical structures with labels. This is a tree structure, which comprises Discourse Trees (DTs). Presented at Figure 2 this Discourse Tree is a representation of the following text:
2.2 Sentiment Analysis
A sentiment analysis can be made at the level of (1) the whole document, (2) the individual sentences, or (what is currently seen as the most attractive approach) (3) at the level of individual fragments of text. Regarding document level analysis [1, 10] - the task at this level is to classify whether a full opinion expresses a positive, negative or neutral attitude. For example, given a product review, the model determines whether the text shows an overall positive, negative or neutral opinion about the product. The biggest disadvantage of document level analysis is an assumption that each document expresses views on a single entity. Thus, it is not applicable to documents which evaluate or compare multiple objects. As for sentence level analysis  - The task at this level relates to sentences and determines whether each sentence expressed a positive, negative, or neutral opinion. This level of analysis is closely related to subjectivity classification which distinguishes sentences (called objective sentences) that express factual information from sentences (called subjective sentences) that express subjective views and opinions. However, we should note that subjectivity is not equivalent to sentiment as many objective sentences can imply opinions. With feature/aspect level analysis  - both the document level and the sentence level analyses do not discover what exactly people liked and did not like. A finer-grained analysis can be performed at aspect level. Aspect level was earlier called feature/aspect level. Instead of looking at language constructs (documents, paragraphs, sentences, clauses or phrases), aspect level directly looks at the opinion itself. It is based on the idea that an opinion consists of a sentiment (positive or negative) and a target (of opinion). As a result, we can aggregate the opinions. For example, the phone display gathers positive feedback, but the battery is often rated negatively. The aspect-based level of analysis is much more complex since it requires more advanced knowledge representation than at the level of entire documents only. Also, the documents often consist of multiple sentences, so saying that the document is positive provides only partial information. In the literature, there exists some initial work related to aspects. There exist initial solutions that use SVM-based algorithms  or conditional random field classifiers  with manually engineered features. There also exist some solutions based on deep neural networks, such as connecting sentiments with the corresponding aspects based on the constituency parse tree .
3 Method for aspect-based sentiment analysis
The proposed Rhetorical and Sentiment Analysis flow is divided into four main tasks:
Rhetorical analysis with sentiment detection.
Aspect detection in textual data.
Methods, techniques, and graph analytics of aspect inter-relations.
Abstractive summary generation in natural language (not included in prototype workflow yet).
The overall characteristics and flow organisation can be seen in Figure 3. Each of the mentioned steps of the proposed method is described in the following subsections.
3.1 Rhetorical Analysis
The goal of discourse analysis in our method is the segmentation of the text for the basic units of discourse structures EDU (Elementary Discourse Units) and connecting them to determine semantic relations. The analysis is performed separately for each source document, and as the output we get Discourse Trees (DT) such as in Figure 2. At this stage, existing discourse parsers will model the structure and the labels of a DT separately. They do not take into account the sequential dependencies between the DT constituents. Then existing discourse parsers will apply greedy and sub-optimal parsing algorithms and build a Discourse Tree. During this stage, and to cope with the mentioned limitation The inferred (posterior) probabilities can be used from CRF parsing models in a probabilistic CKY-like bottom-up parsing algorithm  which is non-greedy and optimal. Finally, discourse parsers do not discriminate between intra-sentential parsing (i.e., building the DTs for individual sentences) and multi-sentential parsing (i.e., building a DT for the whole document) . Hence, this part of the analysis will extract for us distributed information about the relationship between different EDUs from parsed texts. Then we assign sentiment orientation to each EDU.
3.2 Aspect detection in textual data
3.3 Analysis of aspect inter-relations
The third step consists of an Aspect-Rhetorical Relation Graph (ARRG) and content Structuring Aspect Hierarchical Tree (see Figure 3). Discourse Trees of individual documents are processed (the order of EDU is not changed) to form association rules. Then, an Aspect-Rhetorical Relation Graph based on a set of these rules is created. Each node represents an aspect and each edge is one of the relations between the EDU’s aspects. A graph will be created for all documents used in the experiment. The graph can be represented with weighted edges (association rules confidence, a number of such relations in the whole graph etc.), but there is a need to check and compare different types of graph representations. Then, it is possible to characterise the whole graph and each node (aspect) with graph metrics (PageRank , degree, betweenness or other metrics). These metrics will be used for estimating the cut threshold – removing uninformative or redundant aspects. Hence, we will end up with only the most important aspects derived from analysed corpora. Then the graph will be transformed into an Aspect Hierarchical Tree. This represents the correlation between aspects and enables us to generate natural language-based descriptions.
3.4 Abstractive summary generation in natural language
The last step covers summary (abstract) generation in natural language. Natural language generation models use parameterized templates (very limited and dependent on the size of the rule-based system responsible for the completions of the text), or deep neural networks .
4 Experimental Scenario
For the Rhetorical Parsing part of our experiment, we used a special library implemented for such purposes . As a sentiment analysis model, we used the Bag of Word vectorization method with a Logistic Regression classifier trained on 1.2 million (1, 3 and 5-star rating only) of Electronic reviews from SNAP Amazon Dataset . The BoW vectorization method built a vocabulary that considers the top 50,000 terms only ordered by their frequency across the corpus, similarly to supervised learning examples presented in our previous works in . We used a noun and noun phrases extractor according to part-of-speech tagger from the Spacy Python library111https://spacy.io. In order to create an Aspect-Rhetorical Relation Graph we used breadth-first search (BFS) algorithm for each Discourse Tree.
We used Bing Liu’s dataset  for evaluation. It contains three review datasets of three domains: computers, wireless routers, and speakers as in Table 1. Aspects in these review datasets were annotated manually.
|Dataset||# of documents||# of distinct aspects|
4.2 Experimental Setup
We implemented our framework in Python. The first computational step was to load the dataset and parse it into individual documents. Next, each document was processed through the Discourse Parser  and transformed into a Discourse Tree (DT). Then we extracted Elementary Discourse Units (EDUs) from the DT and each EDU was processed through the Logistic Regression sentiment algorithm. All neutral EDUs were taken off from consideration to ensure that the discovered aspects are correlated with authors’ emotions. The remaining EDUs were processed through part-of-speech tagger to extract nouns and noun phrases which we decided to treat as potential aspects. The result of this step was a set of Aspect-based Discourse Trees (ADTs). Then, from each ADT relations between aspects were extracted using breadth-first search, and an Aspect-Rhetorical Relation Graph (ARRG) was created by using aspects and relations such as nodes and edges respectively. Next, we evaluated the importance of aspects using a PageRank algorithm. Our approach resulted in complete list of aspects sorted by PageRank score. We applied a user-selected importance threshold to filter trivial aspects.
In Table 2 there are presented some examples of the results of our approach compared with the annotated data from Bing Liu’s dataset. In the first sentence, the results of the analysis differ because we decided to treat only nouns or noun phrases as aspects, while annotators also accepted verbs. In some cases, such as sentences 2 or 4, our approach generated more valuable aspects than the annotators found, but in some cases, like sentence 5, we found fewer. This is possibly the result of our method of filtering valuable aspects - if some aspects were not frequent enough in the dataset, we can treat them as void. In cases where there is neither aspect nor sentiment in the dataset, such as sentence 6, we measure sentiment as well, as one of our analysis steps.
|No.||Input content||Annotated aspect : sentiment||Detected aspect : sentiment|
|1||I have this connected to my late 2008 MacBook Pro, and it works flawlessly.||works : positive||macbook pro : positive|
|2||We are well pleased with the monitor and the company.||monitor : positive||monitor : positive company : positive|
|3||The changing colors help to tell, with a quick glance.||colors : positive||colors : positive|
|4||The screen is a very pleasing matte, and the colors are great.||colors : positive||screen : negative colors : positive|
|5||I would not recommend this or any Acer product to anyone except perhaps my ex.||Acer product : negative||- : negative|
|6||I purchased this as a Christmas gift||- : -||- : negative|
Figure 4 shows the agreement between our aspects and that of the dataset. We assumed two aspects as equal when they were textually the same. We made some experiments using text distance metrics, such as the Jaro-Winkler distance, but the results did not differ significantly from an exact matching. We fitted the importance factor value (on the X axis) so as to enrich final aspects set: a higher factor resulted in a larger aspects set and a higher value of precision metric, with slowly decreasing recall. First results (blue line on charts) were not satisfactory, so we removed a sentiment filtering step of analysis (orange line on chart), which doubled the precision value, with nearly the same value of recall. The level of precision for whole dataset (computer, router, and speaker) was most of the time at the same level. However, the recall of router was significantly worse than speaker and computer sets.
6 Conclusions and Future Work
We have proposed a comprehensive flow of analysing aspects and assigning sentiment orientation to them. The advantages of such an analysis are that: it is a grammatically-based and coherent solution, it shows opinion distribution, it doesn’t need any aspect ontology, it is not limited to the number of aspects and really important, it doesn’t need training data (unsupervised method). The method proved it has a big potential in generating summary overviews for aspect and sentiment distribution across analysed documents. In our next steps, we want to improve the aspect extraction phase, probably using neural network approaches. Moreover, we want to expand the analysis of the Polish language.
The work was partially supported by the National Science Centre grants DEC-2016/21/N/ST6/02366 and DEC-2016/21/D/ST6/02948, and from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 691152 (RENOIR project).
-  Łukasz Augustyniak, Piotr Szymański, Tomasz Kajdanowicz, and Włodzimierz Tuligłowicz. Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis. Entropy, 18(1):4, 2015.
-  Laurence Danlos. D-STAG: A formalism for discourse analysis based on SDRT and using synchronous TAG. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5591 LNAI:64–84, 2011.
-  Orphée De Clercq, Marjan Van de Kauter, Els Lefever, and Véronique Hoste. LT3: Applying Hybrid Terminology Extraction to Aspect-Based Sentiment Analysis. Proceedings of the 9th International Workshop on Semantic Evaluation – SemEval 2015, (1997):719–724, 2015.
-  Vanessa Wei Feng and Graeme Hirst. Two-pass Discourse Segmentation with Pairing and Global Features. ArXiv e-prints, 1407.8215, jul 2014.
-  Francisco Guzmán, Shafiq Joty, Lluis Màrquez, and Preslav Nakov. Using Discourse Structure Improves Machine Translation Evaluation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 687–698, 2014.
-  Shafiq Joty, Giuseppe Carenini, and Raymond T. Ng. A novel discriminative framework for sentence-level discourse analysis, 2012.
-  Shafiq Joty, Giuseppe Carenini, and Raymond T. Ng. CODRA : A Novel Discriminative Framework for Rhetorical Analysis. Computational Linguistics, 41(January):1–50, sep 2015.
-  Daniel Jurafsky and James H Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Speech and Language Processing An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition, 21:0–934, 2009.
-  Angeliki Lazaridou, Ivan Titov, and Caroline Sporleder. A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1630–1639, 2013.
-  Bing Liu. Sentiment analysis and subjectivity, 2010.
-  Qian Liu, Zhiqiang Gao, Bing Liu, and Yuanlin Zhang. Automated rule selection for aspect extraction in opinion mining. IJCAI International Joint Conference on Artificial Intelligence, 2015-January(Ijcai):1291–1297, 2015.
-  Annie Louis, Aravind K Joshi, Ani Nenkova, Comments Louis, and Aravind Joshi. Discourse Indicators for Content Selection in Summaization Discourse Indicators for Content Selection in Summaization Discourse indicators for content selection in summarization. pages 147–156, 2010.
-  William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization, 1988.
-  Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Suresh Manandhar and Ion Androutsopoulos. SemEval-2015 Task 12: Aspect Based Sentiment Analysis. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado., pages 486–495, 2015.
-  J.R. Martin. English Text. John Benjamins Publishing Company, Amsterdam, nov 1992.
-  J McAuley and J Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. Proceedings of the 7th ACM conference on Recommender systems - RecSys ’13, pages 165–172, 2013.
-  Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. World Wide Web Internet And Web Information Systems, 54(1999-66):1–17, 1998.
-  Maite Taboada. Discourse markers as signals (or not) of rhetorical relations. Journal of Pragmatics, 38(4):567–592, 2006.
-  Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova, Jennifer Foster, and Lamia Tounsi. DCU: Aspect-based Polarity Classification for SemEval Task 4. Proceedings of the 8th International Workshop on Semantic Evaluation – SemEval 2014, (SemEval):223–229, 2014.
-  Bo Wang and Min Liu. Deep Learning for Aspect-Based Sentiment Analysis. pages 1–9, 2015.
-  Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, Steve Young, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (September):1711–1721, 2015.