Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification
Corporate mergers and acquisitions (M&A) account for billions of dollars of investment globally every year, and offer an interesting and challenging domain for artificial intelligence. However, in these highly sensitive domains, it is crucial to not only have a highly robust and accurate model, but be able to generate useful explanations to garner a user’s trust in the automated system. Regrettably, the recent research regarding eXplainable AI (XAI) in financial text classification has received little to no attention, and many current methods for generating textual-based explanations result in highly implausible explanations, which damage a user’s trust in the system. To address these issues, this paper proposes a novel methodology for producing plausible counterfactual explanations, whilst exploring the regularization benefits of adversarial training on language models in the domain of FinTech. Exhaustive quantitative experiments demonstrate that not only does this approach improve the model accuracy when compared to the current state-of-the-art and human performance, but it also generates counterfactual explanations which are significantly more plausible based on human trials.
1 Introduction and Related Work
In recent years, large-scale, pre-trained transformer models have led to massive improvements on a wide range of natural language processing (NLP) tasks [6, 21], including financial technology applications [7, 35, 32, 34]. However, this impressive ability also coincides with an inherent lack of robustness and transparency, which undermines human trust in the prediction outcome. In the highly sensitive (and financially lucrative) area of FinTech, explainable financial text classification remains an open, and highly alluring question. To tackle this problem, this paper advances a novel approach which first applies robust transformer models (by leveraging adversarial training) on a real-world, up-to-date, self-collected mergers and acquisitions (M&A) dataset, and then generating plausible, post-hoc, counterfactual explanations. In the remainder of this section, we describe relevant work to both of these areas before detailing our contributions.
1.1 Artificial Intelligence in Mergers and Acquisitions
M&As have reshaped the global business landscape for generations, and are having an accelerating impact on the world’s economy as new technologies such as the internet, big data, and artificial intelligence disrupt many business sectors . To appreciate this, a recent economic study provided strong evidence that M&A deal rumours could influence the share price volatility of rumor target firms . In particular, they showed that, on average, M&A rumors have a positive short term impact and a negative long term impact on the cumulative abnormal returns of the potential acquirers and targets. In the existing AI literature, focus here is typically on predicting likely M&A targets , and forecasting the likely success of M&A  for developing high-risk/high-reward investment strategies based on M&A speculation . While the existing literature typically focuses on predicting likely M&A acquirers and targets, in this work we address a distinct but related task: namely, whether a merger and acquisition rumor is likely going to prove to be correct.
1.2 Visualization-based Explanations
To interpret a model’s prediction, prior efforts have focused on either incorporating pre-hoc analysis into the experimental design , or developing post-hoc analysis algorithms to select or modify particular instances of the dataset to explain the behavior of models [15, 16]. Recent research  shows that transformer models can not be perfectly explained from their intrinsic architecture, and a further work  provides strong evidence that self-attention distributions are not directly interpretable. For this reason, model-agnostic, post-hoc explanation methods have come to the fore among these works for explaining text classification models, as they are easy to understand and do not require access to the data or the model .
Towards post-hoc explanation in NLP tasks,  proposes a popular way named contextual decomposition (CD) to quantify the importance of each individual word/phrase by computing the change to the model prediction when solely removing a word/phrase. Its hierarchical extensions [27, 13] continue to refine the explanation algorithms that calculate and further visualize the individual phrase’s importance. However, despite these visualization-based methods [25, 27, 13] having achieved good results on a popular dataset of sentiment analysis (namely the Stanford Sentiment Treebank-2 [SST-2] dataset where human create the ground truth with their subjective judgement), how to generate explanations in more complex scenarios where human performance is worse than a model have not been well studied. As a result, the prior lines of visualization-based works cannot provide a clear boundary between positive and negative instances to human, whereas counterfactuals could provide \enquotehuman-like logic to show a modification to the input that makes a difference to the output classification . Hence, post-hoc, example-based explanation methods have received more and more attention in recent years .
1.3 Counterfactual Text Explanations
Counterfactual explanations are renown for their explanatory ability in AI systems ; specifically, they offer the ability to explain models (such as transformers) without having to \enquoteopen the black-box , by conveying causal information about what contributed to a given classification. To understand counterfactuals in the context of text classification, consider a sentiment classification task were a black-box model may classify \enquoteJohn loved the film with a positive sentiment, and explain the prediction counterfactually by presenting \enquoteJohn hated the film. Glossed, this latter text is the AI explaining the prediction by saying \enquotef the word love was replaced with the word hate, I would have thought it was a negative sentiment. This allows us to understand the main reasoning process behind the classifier in question, thus explaining the prediction causally. To understand the issue of counterfactual plausibility, consider that the previous explanation may also generate a counterfactual which reads \enquoteJohn not the film. This text may \enquoteflip the classification to the counterfactual class, but it is grammatically implausible, and (arguably) very difficult to contextualize. The reason this is important is because humans avoid creating counterfactuals which are far from a \enquotepossible world , and by extension wildly implausible [3, 17]. In response to this, our work attempts to guarantee more grammatically plausible explanations, and does not rely on attention weights, nor is it constrained to a specific text domain.
Contributions and Paper Outline
We present a novel dataset to the interesting and challenging problem of artificial intelligence in M&A prediction.
To the best of our knowledge, the present work is the first general approach to generate grammatically plausible counterfactual explanations for unstructured text classification.
The primary technical contribution in this work is to generate grammatically plausible counterfactuals by replacing the most important words with the antonyms (REP-SCD) based on pre-trained language models. Furthermore, two additional variants (removing/inserting works at the most important place, namely RM-SCD and INS-SCD) are proposed to guarantee counterfactual generations, albeit ones which are less plausible.
The remainder of this paper is organized as follows. Section 2 details our novel dataset and the pre-processing steps involved. Section 3 describes our adversarial training approach, with the sensitivity-based method for counterfactual explanation generation. Exhaustive experiments (both quantitative and human-based) show clear improvements in our method over current state-of-the-art, both in regards to classification accuracy, and explanation quality (see Sections 4 and 5). Finally, the implications of this work on XAI and future research is discussed.
2 The Novel Mergers and Acquisitions Dataset
|#Processed deal news total (2007-2019)||4,098|
|#Validation (2015 - 2016)||478|
|#Test (2017 - Aug 2019)||500|
|#Unique companies and institutions||1,406|
For this study we adopted a large-scale, up-to-date M&A dataset collected from Zephyr, a comprehensive database of deal data from the \enquotereal world. The dataset
In order to prepare the raw dataset for use in this study, a number of pre-processing steps were carried out:
In this work we chose to focus on a binary classification task and, as such, removed instances with outcome types of cancelled and pending, leaving only those instances that correspond to completed deals (the positive class) and rumours (the negative class).
We eliminated instances where both acquiring and target companies were non-US, due to a tendency towards low-quality data; in other words, all of the instances in our dataset include a US Listed Company as either the acquirer or the target or both.
Articles published within one day or after the deal announcement date were also removed, this is because our interest is in developing a prediction model that is capable of generating accurate predictions at least one day in advance of any deal outcome.
Finally, the remaining instances are randomly over-sampled to ensure an even split between positive (completed) and negative (rumours) instances for each year.
The result is a dataset of 4,098 instances (news articles and meta-data) which we split into training, validation, and testing sets on a year-by-year basis (see Table 1).
The pipeline of our method is shown in Fig. 1. First, as a prerequisite, a transformer variant is fine-tuned on the M&A prediction task, alongside adversarial training (which as we shall see is shown to be promising in this domain). Second, important words in the test instances are identified using a sampled contextual decomposition technique after the prediction. Third, a counterfactual explanation is generated by replacing these words with grammatically plausible substitutes. As we shall see, although this method does not always guarantee a plausible counterfactual will be found, we propose two alternative methods which will, albeit with the possible trade-off of plausibility. These steps are detailed next.
3.1 Step 1: Robust Transformer Classification Models
As eluded to earlier, M&A prediction is a highly sensitive domain, and despite adversarial training showing promise previously [9, 28], it has never been tested in this domain. Hence, to try ensure a robust model which can simultaneously generate intelligible explanations, we explore its usage here compared to other popular approaches. Given a news article, we adopt the classical transformer architecture proposed by . The original multi-head self-attention is subsequently applied to the -th document , which is calculated as follows:
where are weight metrics, and the attention is computed as:
for input query, key and value matrices . The outputs from the attention calculations are concatenated and transformed using an output weight matrix .
Additionally, the adversarial noise, treated as a form of regularization, is generated by the Fast Gradient Method (FGM)  and Projected Gradient Descent (PGD) . The idea of using adversarial perturbation is derived from the usage of adversarial attacks  to evaluate the robustness of neural networks, while the recent advances of using the adversarial training in NLP models  inspires us to use it as a way of regularization. For each embedded word in -th news article , the FGM computes its perturbation as follows:
where is the perturbation of , denotes the current values of the parameters of the classifier, and denotes the loss function (cross entropy) associated with the classifier. The perturbation can be easily computed using back-propagation. The projected gradient descent, which can be considered as a multi-step variant of the FGM, computes the perturbation of iteratively:
where is the constraint space of the perturbation, denotes the projection of a vector onto the feasible set , and is the step size. We use Adam optimizer with learning rate decay to train our model until convergence.
3.2 Step 2: Context-Independent Word Importance
To calculate the context independent importance up to one word, we adopt the sensitivity of contextual decomposition technique from  which removed part of inputs from the sequence text to evaluate a model’s sensitivity to them, thereby allowing for the identification of important features. In its hierarchical extensions -- Sampling and Contextual Decomposition (SCD),  mask out the phrase from the input while the max sequence length is set to 40. However, the average input length in our data is much larger than 40. We, therefore, propose a phrase-level removing method only if the phrase starts with the negative pronouns or limitations. Otherwise, only a single word will be removed. For example, in the sentence \enquotethe deal is not closing currently, the attribution of \enquoteclosing should be positive while the attribution of \enquotenot closing should be negative. In this situation, we remove the whole phrase \enquotenot closing together to calculate the influence in terms of the logits change in the output layer of the transformer and then assign the negative score to the word \enquoteclosing.
Given a phrase starting with the negative limitations in the -th document , we sample the documents which contain the same phrase to alleviate the influence by chance when there are multiple shreds of evidence saturating the prediction. For example, in the source \enquoteJPMorgan is closing in on a deal, sources close to the situation are optimistic for deal completion, if we only remove the word \enquoteclosing, the prediction would not be changed so much. In this sampling way, the proposed context-independent importance of word and phrase is more robust to saturation. The formula for calculating the importance can be written as:
where denotes the resulting document after masking out a single token or a phrase starting with the negative pronoun in the length of surrounding the phrase . we use to represent the model prediction logits after replacing the masked-out context. indicates the operation of masking out the phrase in a input file sampling from the testing set .
As an aside, the resulting top 15 most influenced words are shown in Table 2. In total, there are 123 positive words and 155 negative words in the dictionaries. We can see the average influence score of positive words (0.637) is higher than the negative words (0.385). It may reveal that positive words usually contain more powerful clues in predicting the M&A deal. That would be interesting to see which kind of words in the sources illustrate the deal is more likely to be completed in the future and which kind of words would be likely to kill the deal.
|Positive Words||Sensitivity||Negative Words||Sensitivity|
3.3 Step 3: Counterfactual Instance Generation
As shown in Algorithm 1, we summarize three different counterfactual generation methods, namely, the primary technique which generates grammatically plausible counterfactuals (REP-SCD), and two further variants to guarantee counterfactual generation (RM-SCD and INS-SCD). We combine these three methods to alleviate a major issue in counterfactual explanation, that is, there is no guarantee that for a given example a counterfactual instance is found. Our main technique identifies the most important word(s) in a test instance using SCD and replaces them with the intersection of grammatically plausible substitutes [using masked language model (MLM)] and words in the reverse emotional dictionary. The raw document content itself is taken as input, and MLM outputs for each masked position. After all masked positions are infilled, we get the reconstructed document:
We iterative repeat this operation at the most important word positions ranked by SCD until the reconstructed document ultimately moves the model’s classification towards the opposing class. Notably, there may be more than one counterfactual explanation corresponding with the original text instance.
4 Experiment 1: Financial Text Classification with Robust Transformers
In this section we describe the results of a comprehensive evaluation of classification accuracy, comparing a variety of different classification baselines (including a human baseline) to our adversarial transformer approach.
4.1 Methods Used
The baselines used can be grouped into several distinct categories: human evaluations -- traditional machine learning approaches (SVM) -- classical deep learning approaches (CNN , BiGRU  , and HAN ) -- and various transformer approaches with/without pruning strategies. These transformer-based models are generally considered to provide the current state-of-the-art in text classification. We reproduce these baselines based on the Transformers.
Acquiring a human baseline
As a baseline, we asked 26 participants which were experts in economics and finance to predict M&A events by completing 50 M&A evaluation questionnaires. The participants consisted of Ph.D. students, and academics from the fields of economics/finance. All participants were either native English speakers or had a high degree of English competence. Each questionnaire provided information on ten M&A cases/instances, sampled randomly without replacement from the test set. In addition, the news articles available in the dataset that were published before the deal announcement were also provided. The questionnaire asked the participant to predict the outcome of the deal (complete or rumour), and to state their confidence in this prediction.
4.2 Classification Results
In line with best practice, model hyper-parameters are tuned using the validation set. In particular, the maximum sequence length is set as 256, and the size of transformers are all set as large. All experiments are using the conventional Matthews Correlation Coefficient (MCC), accuracy and F1 metrics. The classification results are summarized in Table 3 with Random Guess used to provide a lower-baseline based on chance. While the human evaluators performed better than chance their ability to predict deal outcomes is limited when compared to the more sophisticated machine models that follow. These results are particularly compelling as the human evaluators had considerable domain expertize.
Each of the machine learning approaches offer substantial improvements over the human evaluators and a clear separation can be seen between traditional machine learning (with MCC scores in low 0.7 range/F1 scores in the low 0.8 range), classical deep learners (with MCC scores in the range 0.73-0.74/F1 scores in the range 0.84-0.85), and recent transformer-based models (MCC0.75/F10.87).
We further evaluate the relative influence of the adversarial perturbation to test the robustness of the models. We find that all variants of the transformer [19, 26] benefit from the adversarial perturbation during the training process in terms of the prediction results in the practice. For exploring the reason why the optimal transformer classifier can outperform the human test a lot -- 39%, we take the best performed model -- RoBERTa  with adversarial training as our optimal classifier in the following experiments for generating the plausible counterfactual explanations.
5 Experiment 2: Generating Plausible Counterfactual Explanations
Interpretability is an increasingly important property for many deep learning techniques, including computer vision and natural language processing , especially in critical tasks such as financial text classification; high-value investment decisions demand a reasonable level of interpretability if investors are to trust the predictions that come for a system such as the one described in this work. In this section, we describe the qualitative analysis for each of our methods. Subsequently, we show the evaluation of user studies compared to the existing example-based explanation methods.
5.1 Qualitative Analysis for the Resulting Counterfactual Instances
In qualitative analysis, we identified five typical patterns among the generated counterfactual instances as shown in Table 4 where we highlight the changing parts. Based on the 500 testing examples, we guarantee that there is at least one counterfactual instance corresponding with the original input. We gain insight into which aspects are causally relevant by comparing the original context to the revised context which can flip the classifier’s prediction.
Types of Algorithms
|Ori: Professional vacation services provider ILG is considering a merger with Diamond Resorts International...|
|REP-SCD: Replacing with the certainty word||Rev: Professional vacation services provider ILG is announcing a merger with Diamond Resorts International...|
|Ori: Vivendi is in early discussions to sell a 10.0 per cent stake in Universal Music Group (UMG) to Tencent for roughly EUR 3.00 billion...|
|REP-SCD: Changing the deal value||Rev: Vivendi is in early discussions to sell a 10.0 per cent stake in Universal Music Group (UMG) to Tencent for roughly EUR 3.00 million|
|Ori: Stryker is buying US-based spinal implant technology company K2M Group Holdings for USD 1.40 billion in cash|
|INS-SCD: Recasting as||Rev: Stryker is potentially buying US-based spinal implant technology company K2M Group Holdings for USD 1.40 billion in cash|
|Ori: WPP has confirmed the recent speculation that it has entered into exclusive negotiations with private equity firm Bain Capital...|
|INS-SCD: Inserting the negative word||Rev: WPP has not confirmed the recent speculation that it has entered into exclusive negotiations with private equity firm Bain Capital...|
|Ori: This suitor is the Namdar and Washington Prime consortium, the insiders noted, adding that there can be no certainty a deal will complete...|
|RM-SCD: Removing the negative limitation(s)||Rev: This suitor is the Namdar and Washington Prime consortium, the insiders noted, adding that there can be certainty a deal will complete...|
5.2 Human Evaluation for the Explanation
We implement interpretation experiments on the optimal fine-tuned transformer classifier. While an explainable model trained with supervised learning is a common method to interpret the results of text classification , the self-supervised learning explainable frameworks have been scarcely found. Meanwhile, the work in  consider similar types of edits to generate counterfactually-revised data, however, all of the instances are generated by human which greatly limits the expansibility of the method. To comprehensively evaluate the performance of our method, we consider a state-of-the-art example-based explanation framework for comparison, namely HotFlip , which uses gradients to identify important words and then flip it with the adversarial word which can cause the maximum change in gradients.
For user evaluation, here we ask domain experts in finance to rate our explanations on two aspects, (1) how plausible (mainly in terms of grammar and comprehension) it is, and (2) how reasonable it is (i.e., does the explanation make sense). We compare our method to Hotflip - the current state-of-the-art framework for counterfactual explanation - at the time of writing. Each score is measured on a scale of 1-5, where 5 is the best, and 1 is the worst. We randomly sample 100 examples from the testing set for 5 participants to answer (20 examples per person). By combining the REP-SCD, RM-SCD, INS-SCD together, our method achieves significantly higher ranking score compared to HotFlip, more specifically, 2.35 score improvements (4.35/2.00) were made regarding plausibility while 0.85 score improvements (4.00/3.15) were made on reasonableness, showing a -value less than 0.001 and 0.05, respectively. Hence, there is compelling evidence that our method can generate counterfactual explanations which are more plausible and reasonable.
6 Conclusion and Future Work
In this work, we pursued a new research problem of M&A prediction. Our transformer-based classifier leveraged the regularization benefits of adversarial training to enhance model robustness. More importantly, we built upon previous techniques to quantify the importance of words and help guarantee the generation of plausible counterfactual explanations with a masked language model in financial text classification. The results demonstrate superior accuracy and explanatory performance compared to state-of-the-art techniques. An obvious extension would be to include canceled deals into the classifier, or to predict novel M&A events based on market descriptions of companies (e.g., scale, finances, and target markets). Moreover, additional financial events (e.g., misstatement detection and earnings call analysis) is yet another related task to be considered for further research.
We would like to thank Tianhao Fu, Yimeng Li, Yang Xu and Prof. Mark Keane for their helpful advice and discussion during this work. Also, we would like to thank the anonymous reviewers for their insightful comments and suggestions to help improve the paper. This research was supported by Science Foundation Ireland (SFI) under Grant Number _.
- (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §4.1.
- (2020) On identifiability in transformers. In Proceedings of the International Conference on Learning Representations (ICLR), External Links: Cited by: §1.2.
- (2019) Counterfactuals in explainable artificial intelligence (xai): evidence from human reasoning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6276–6282. Cited by: §1.2, §1.3.
- (2017) Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Cited by: §3.1.
- (2016) Abnormal returns from takeover prediction modelling: challenges and suggested investment strategies. Journal of Business Finance & Accounting 43 (1-2), pp. 66–97. Cited by: §1.1.
- (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
- (2018) Learning target-specific representations of financial news documents for cumulative abnormal return prediction. In Proceedings of the 27th International Conference on Computational Linguistics (COLING-18), pp. 2823–2833. Cited by: §1.
- (2017) Hotflip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751. Cited by: §5.2.
- (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §3.1.
- (2018) Interpretable credit application predictions with counterfactual explanations. arXiv preprint arXiv:1811.05245. Cited by: §1.3.
- (2020) Why attention is not explanation: surgical intervention and causal reasoning about neural models. In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 1780–1790. Cited by: §1.2.
- (2009-03) The shrinking merger arbitrage spread: reasons and implications. Financial Analysts Journal 66, pp. . External Links: Cited by: §1.1.
- (2020) Towards hierarchical importance attribution: explaining compositional semantics for neural sequence models. In Proceedings of the International Conference on Learning Representations (ICLR), External Links: Cited by: §1.2, §3.2.
- (2020) Learning the difference that makes a difference with counterfactually-augmented data. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §5.2.
- (2020) Good counterfactuals and where to find them: a case-based technique for generating counterfactuals for explainable ai (xai). In International Conference on Case-Based Reasoning (ICCBR), Cited by: §1.2, §1.2.
- (2019) Twin-systems to explain artificial neural networks using case-based reasoning: comparative tests of feature-weighting methods in ann-cbr twins for xai. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 2708–2715. Cited by: §1.2, §5.
- (2020) On generating plausible counterfactual and semi-factual explanations for deep learning. arXiv preprint arXiv:2009.06399. Cited by: §1.3.
- (2014) Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Cited by: §4.1.
- (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. Cited by: §4.2.
- (2020) Adversarial training for large neural language models. arXiv preprint arXiv:2004.08994. Cited by: §3.1.
- (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: §1, §4.2.
- (2016) Investor reaction to merger and acquisition rumors. SSRN 2813401. Cited by: §1.1.
- (2018) Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §3.1, §3.2.
- (2017) Adversarial training methods for semi-supervised text classification. In Proceedings of the International Conference on Learning Representations (ICLR), Cited by: §3.1.
- (2018) Beyond word importance: contextual decomposition to extract interactions from LSTMs. In Proceedings of the International Conference on Learning Representations (ICLR), External Links: Cited by: §1.2.
- (2019) DistilBERT, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. Cited by: §4.2.
- (2019) Hierarchical interpretations for neural network predictions. In Proceedings of the International Conference on Learning Representations (ICLR), External Links: Cited by: §1.2.
- (2018) Robustness may be at odds with accuracy. In International Conference on Learning Representations, Cited by: §3.1.
- (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §3.1.
- (2017) Counterfactual explanations without opening the black box: automated decisions and the gdpr. Harv. JL & Tech. 31, pp. 841. Cited by: §1.3.
- (2019) AllenNLP interpret: a framework for explaining predictions of nlp models. arXiv preprint arXiv:1909.09251. Cited by: §5.2.
- (2019) Sentiment-aware volatility forecasting. Knowledge-Based Systems 176, pp. 68–76. Cited by: §1.
- (2016) Modeling contagious merger and acquisition via point processes with a profile regression prior.. In International Joint Conferences on Artifical Intelligence, IJCAI-16, pp. 2690–2696. Cited by: §1.1.
- (2020) HTML: hierarchical transformer-based multi-task learning for volatility prediction. In Proceedings of The Web Conference 2020, WWW ’20, pp. 441–451. Cited by: §1.
- (2018) Explainable text-driven neural network for stock prediction. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 441–445. Cited by: §1.
- (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-16, pp. 1480–1489. Cited by: §4.1.