Generating summaries tailored to target characteristics

Generating summaries tailored to target characteristics

Abstract

Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective, we provide a categorization around these characteristics relevant to the task of text summarization: one, focusing on what content needs to be generated and second, focusing on the stylistic aspects of the output summaries. We use our insights to provide guidelines on appropriate methods to incorporate various classes characteristics in sequence-to-sequence summarization framework. Our experiments with incorporating topics, readability and simplicity indicate the viability of the proposed prescriptions.

1 Introduction

Automatic text summarization [21] is the task of generating a summary of an input document while retaining the key aspects. Such a summary helps in presenting the important content from a long input text in a succinct form for quick information consumption. Traditional methods for summarization [21] extract key sentences from the source text to construct an ‘extractive’ summary. Recent efforts towards ‘abstractive’ summarization have geared towards generating human-like, paraphrased summaries from the input article [19, 30].

While these algorithms allow for the generation of a single summary, it is often desirable to generate summary variants tailored towards specific characteristics. For instance, readers may be interested in summaries of different lengths or might want to focus on specific entities/topics from the input text [8, 18, 37]. Depending on different age groups of the readers, they might prefer formal/informal variants of the summary [22]. Irrespective of the application context, it has been shown that incorporating these characteristics at the time of generation can yield more contextual summaries [17].

Recent works have proposed different ways to incorporate target characteristics at the time of summary generation: introducing modifications to the dataset [8, 18], architecture [22], learning objectives [26] or the decoder probabilities [17]. However, all these attempts handpick a few characteristics and propose ways to incorporate them. In the absence of appropriate insights, it is unclear as to why these methodologies work in tuning the summaries towards the chosen characteristics and why the same cannot be extended for other characteristics. In this work, our objective is to gain a holistic understanding around these additional constraints, centering on the task of text summarization.

Taking a step in this direction, we propose a categorization of these characteristics into 1) content-specific, which primarily focus on what content is presented pivoting on the semantics or information presented in the output summary and 2) style-specific focusing on the stylistic expressions pivoting on the linguistic presentation in the output summary. Through a comparative evaluation of various existing and proposed methods, we further prescribe guidelines to help choosing the right framework for tailoring these categories of characteristics. Our primary contribution is providing a categorization of target characteristics as content and style specific, towards a holistic understanding of tailored summary generation. Additionally, we propose an attention-boosting approach to improve tailoring of content-specific characteristics and a policy gradient based algorithm to incorporate stylistic characteristics in summaries.

2 Related Work

Traditional methods for summarization [21] extract key sentences from the source text to construct an ‘extractive’ summary. Features like descriptiveness of words and word frequencies have been explored to choose the summary sentences. However, humans summarize by understanding the content and paraphrasing the understood content into a summary. Extractive summarization is hence unable to produce ‘human-like’ summaries. This has led to efforts towards ‘abstractive’ summarization, which paraphrase summaries from input article content.

Early attempts at abstractive summarization created summary sentences either based on templates [38, 12] or used ILP-based sentence compression techniques [10, 4, 3]. With the advent of deep sequence to sequence models [33], attention-based neural models have been proposed for summarizing long sentences [29, 6]. These approaches were further improved by incorporating Abstract Meaning Representations [34] and using hierarchical encoding networks [20]. More recent approaches [19, 30] have focused on large scale datasets for summarization such as CNN/DailyMail corpus [15, 20]. Gulcehre et al.  [13] introduced the ability to copy out-of-vocabulary words from the article to incorporate rarely seen words like names in the generated text. Tu et al. [36] included the concept of coverage, to prevent the models from repeating the same phrases while generating a sentence. See et al. [30] proposed a pointer-generator framework which incorporates these improvements, and also learns to switch between generating new words and copying words from the source article.

Although research has primarily focused on unconditioned abstractive text summarization, there have been some recent efforts to incorporate a variety of additional constraints into the generation algorithm. Fan et al.  [8] use explicit indicators to control the length of the output summary, imposing a constraint on how detailed the output needs to be. They extend the same technique to also control the entities which must be focused on while generating the summary. Krishna et al.  [18] generated topic-oriented summaries by using an indicator topic-vector along with the input representation. Each of these approaches aim to control the information presented in the generated summary and therefore modifies the attention distribution to focus on appropriate parts of the text as dictated by the target characteristics. We group these approaches as content-specific characteristic tuning.

Another direction of efforts have attempted to incorporate aspects like sentiments or tense using generative models like variational auto-encoders [16] or using adversarial training [32]. Focusing on politeness, Sennrich et al.  [31] propose modifications to neural machine translation setup to generate polite variants. Ficler et al.  [9] propose the use of a conditional language model to generate text with variations such as descriptiveness, personal and sentiment simultaneously. More recently, generating text with varying levels of formality was studied in Machine Translation [22, 23]. Oraby et al.  [24] attempt to control personality dimensions in generation, namely, agreeable, disagreeable, conscientious, unconscientious and extravert by using indicator tokens and stylistic encodings. Krishna et al.  [17] modify the decoding algorithm to produce readable and simple summaries. Each of these exploration focus primarily of tuning the linguistic presentation of the generated text and we group these as style-specific characteristic tuning.

Policy learning based approaches [26] have shown promise to control several qualitative characteristics explicitly and can potentially be used for both content and style specific characteristics. While it has been successfully deployed for metrics like ROUGE [26], we show its applicability to style-specific characteristics by proposing a policy gradient framework for readability and simplicity.

Content vs Style: The notion of style and the associated nomenclature is quite convoluted in the literature [35]. Approaches in style transfer try to obtain independent latent representations for style and semantics [1],[14], [32], [39], [27], [40] of the content. Using this interpretation, we see the output of a text generation system as a combination of the semantics of the information which is being presented and the style associated with that content. However, unlike these approaches that learn the notion of style implicitly from the available corpora, we fragment style across a set of dimensions such as readability, simplicity, etc. These are referred to as various aspects of style aligning with ad-hoc approaches towards style transfer as described in [35].

3 Pointer-Generator Framework

We base all our explorations on the pointer generator network [30]. However, all our findings and insights are generic and can be extended to any other frameworks without loss of generality. We describe the pointer generator framework here for the sake of completion and please refer to [30] for more details on the framework. The pointer generator network [30] consists of an encoder and a decoder, both based on LSTM architecture. Given an input article, the encoder takes the embedding vectors of each word in the source text and computes the encoder hidden states . The final hidden state is passed to a decoder, which computes a hidden state at each decoding step and calculates an attention distribution over all words as , where,

(1)

where and are model parameters to be trained. The attention is a probability distribution over words in the source text, which aids the decoder in generating the next word in the summary using words in the source text with higher attention. The context vector is a weighted sum of the encoder hidden states (weighted on the attention on the input word at step) and is used to determine the next word to be generated. The attention distribution allows the network to focus on specific parts of the input as the output summary is generated. To tailor a summary to various content-specific characteristics, it is important to modify this attention distribution to focus on the appropriate parts of the input text as required by the characteristics tuned.

At each decoding step, the decoder also gets the last word in the summary generated so far and computes a scalar denoting the probability of generating a new word from the vocabulary, , where are trained vectors. The network probabilistically decides based on , whether to generate a new word from the vocabulary or copy a word from the source text using the attention distribution. For each word in the vocabulary, the model calculates , the probability of the word getting generated next. For each word in the input article, its total attention received yields its probability of being copied. Since some words occur in the vocabulary and also the input article, they will have non-zero probabilities of being newly generated as well as being copied. Hence, the total probability of being the next word generated in the summary(denoted by ) is given by,

(2)

The second term allows the framework to choose a word to copy from the input text using the attention distribution. The pointer-generator network further employs a coverage mechanism to encourage diversity in attention distributions over time steps. The training loss is set to be the average negative log-likelihood of the ground truth summaries. The model is trained using back-propagation and the Adagrad gradient descent algorithm [7]. Since the stylistic characteristics deal with specific expressions of the output text, it can be tailored by modifying to incorporate the corresponding stylistic preferences. For more complex characteristics, as we show, it is possible to define a reinforcement learning based loss appended to the training loss to tailor the specific characteristics.

4 Content-specific Characteristics

Content-specific characteristics primarily govern what content needs to be presented in the output summary. For the rest of the paper, we illustrate the needs and modeling for content based characteristics with topical tailoring. However, the proposed approach can be extended to other content characteristics like entity-centric tailoring, etc.

Often, the whole content of the article may not be relevant to the readers and may prefer specific elements of the input to be summarized. For instance, a sports enthusiast may only be interested in content concerning that domain, or a surgeon may only be interested in health-related content. This calls for a need to generate multiple summary variants taking this information into account. Table 1 shows a particular instance from our dataset which talks about both Politics and Military. If a reader is interested only in Politics, the baseline summary generated by pointer-generator (PGen) model does not refer to politics and hence fails to meet the needs.

Article: bernie sanders , my vermont senator and , indeed , a friend of many years , is now running for president . he noted at his announcement -lrb- with a familiar note of wise irony -rrb- : “ people should not underestimate me . ” to most americans , of course , sen. bernie sanders is only a name , if that . he is barely known to the general public , which makes him a very long shot indeed to win election to the highest office in the nation . -lrb- cnn -rrb- he was impressively polite and bright in the eyes of his boyhood teachers , an encourager of his college friends . he was a docile captured killer in the care of paramedics tending to his gunshot wounds . dzhokhar “ jahar ” tsarnaev ’s defense team is seeking to spare him from a death sentence for his part two years ago in the boston marathon bombings and murder of an mit police officer …
PGen: dzhokhar “ jahar ” tsarnaev ’s defense team is seeking to spare him from a death sentence . he was a docile captured killer in the care of paramedics tending to his gunshot wounds . tsarnaev was convicted april 8 on all 30 counts , including 17 that carry a possible death penalty .
Token-based (mixed) [18] + Attention Boosting; Politics: sen. bernie sanders is running for president . he is barely known to the general public , which makes him a very long shot indeed . he is barely known to the general public , which makes him a very long shot indeed .
Token-based (mixed) [18] + Attention Boosting; Military: dzhokhar “ jahar ” tsarnaev ’s defense team is seeking to spare him from a death sentence . he was convicted april 8 on all 30 counts , including 17 that carry a possible death penalty . a paramedic testified wednesday that it was common for patients in shock to become agitated.
Table 1: Sample output topic-tailored summaries generated by Token-Based approach trained on CNN-DM-mixed dataset. We show just the top few sentences in the input article, in the interest of space.

Sequence-to-sequence learning models have been shown to understand where to look in the input through attention mechanisms which is then used for output generation. Tailoring content-specific characteristics would require this attention to be tuned to focus on the relevant parts of the input (e.g. the relevant parts of the input talking about a topic of interest) to generate the desired output. This requires the model to be taught (either explicitly or implicitly) where to attend in the input article to tailor the summary appropriately.

One possibility is to maintain explicit indicators for each category of the characteristic (e.g. for each topic), allowing the model to learn where to pay more attention directly from the data. Fan et al.  [8] propose to use such indicator tokens to tune characteristics such as length or desired entities. When training on an (article, summary) pair belonging to a particular bin, a token indicating the characteristic (topic in this case) represented by the summary is added to the beginning of the input article. While decoding an unseen article, the framework can generate multiple summary variants based on what token is prepended to the input word sequence. Internally, the model uses the token to learn a conditioned space of parameters, ensuring appropriate attention tuning to generate the summary with corresponding tailoring. We refer to this approach as Token-Based in our experiments.

A key requirement to make the model learn these intricacies is that the training data should contain sufficient samples under each category. However, there could be a skew in the dataset, which calls for alternate approach to tackle these characteristics. To deal with this problem, Krishna et al.  [18] create a separate dataset where the model sees multiple summary variants for the same input article by mixing multi-topic articles. Given such a dataset, the vocabulary tokens can be used to guide the learning process towards a topic specific attention with a skewed-dataset. This method is called as Token-Based (mixed) which is trained on such an interspersed dataset.

While [8] and [18] use token based approaches implicitly teach the networks by taking advantage of the diversity in the training data, we propose an alternative to “explicitly” boost the attention distributions (referred to as Attn-Boost), restricting the model to focus on some parts of the input more than others. More formally, we modify Eq. 1 from the pointer generator as,

(3)

where and are trainable model parameters as before and use this to compute the attention . is a word specific attention boosting parameter. This explicitly teaches the model to pay more attention towards specific words than others. We leverage topic specific word lists curated by [18] and select the top ( in our experiments) sentences from the input article, which are most related to the target topic. We explicitly boost s of all the words in these sentences using the topic confidence measures as used by [18].

To draw more insights on each of these approaches, we evaluate it on the task of topic-based summarization on the CNN/DailyMail (CNN-DM) dataset [15, 20]. The dataset consists of training, validation and test instances. The articles have an average length of tokens and multi-sentence summaries with average length of tokens. We use the vanilla pointer generator (PGen) as the baseline for all our experiments retaining the hyper-parameters by See et al.  [30]. We train the Token-Based model [8] for topics by categorizing the ground-truth summaries into topics: business, education, entertainment, health, military, politics, social, sports and technology (extending the setup by [18]) and prepending the topic to the input article while training.

Following [18], we also have a setup where we intersperse articles from different topics in CNN-DM, resulting in multiple topic-specific ground truth summaries for the same article. Using the topics as before, the model now sees multiple summaries for the same article. There are article–summary-topic tuples for training, tuples for validation and tuples in the test dataset.

To evaluate the generation quality of all the approaches, we compare them on ROUGE , and F1-score. Note that generating summaries for all the topics may not make sense for the same input article, particularly when the article does not talk about the target topic. Hence, for each article in the test set, we generate the summary corresponding to the target topic defined by the ground truth summary. Then, we use the topic specific word lists  [18] to get the top- and top- topics in the decoded summaries. The fraction of times the target topic lies in top- and top- topics of the decoded summaries, defines the Top1 and Top3 accuracies of the various setups.

Method ROUGE F-score % accuracy
1 2 L Top-1 Top-3
PGen [30]
Token-Based [8]
Attn-Boost (Proposed)
Token-Based [8] + Attn-Boost
Token-Based (mixed) [18] + Attn-Boost
Table 2: Performance of proposed methodologies for generating topic-tuned summaries on the CNN-DM-mixed test dataset.

Table 2 summarizes the results of our methods for generating topic-oriented summaries. We observe that boosting attention values explicitly shows improvement in topic percentage accuracies but it suffers a decline in quality based on Rouge scores. This is expected since the explicit topic attention would make the model attend to parts of documents that are different from the ground truth summary. On the other hand, token based approach improves on ROUGE with a lesser topical accuracy. A combined framework of token-based and attention boosting yields the best performance across both ROUGE and topical accuracy metrics. This is perhaps because in the combined setup, the model learns more intricacies implicit in the data along with explicit attention to the topics - thus getting the best of both frameworks. When we train the same setup with the mixed dataset by [18], both ROUGE and the topical accuracies improve, suggesting the importance of the diversity in data for the network to implicitly learn attention patterns.

We show generated summaries from the token based approach in Table 1 for a particular instance from the testing dataset. The article was created by combining two articles from Politics and Military domain. The proposed approach appended with token-based framework on CNN-DM-mixed dataset is able to generate topic-specific variants, while the PGen approach fails to meet the requirement towards both the topics. Figure 1 shows the average attention on the most attended parts of the input for the same instance, by the token based model trained on CNN-DM-mixed. When generating a ’Politics’ oriented summary, the attention is on words like ’president’, ’bernie sanders’ and ’general public’ showing the bias towards political phrases. On the other hand, when the target topic is ’Military’, the focus/attention shifts to ’death sentence’ and ’defence team’.

(a) Politics
(b) Military
Figure 1: Attention Distribution Over the Source Article for different Target Topics.

Our evaluations show that tuning content based characteristics can be achieved by modifying the attention of the network - either implicitly or explicitly. Implicit attention modification uses a token based approach [8], but relies on the diversity of the characteristics in the data. Where not available, feeding off of an interspersed dataset to artificially infuse diversity [18] is beneficial. By combining an interspersed dataset with explicitly attention boosting framework, the model is able to tune the characteristics better by learning where to attend and where-not to attend while tuning the content-based characteristics.

5 Stylistic Characteristics

Next, we focus on incorporating stylistic characteristics in the generated summary. Style-specific preferences ensure that the content is served appropriately to the target audience. In this direction, prior work has focused on incorporating dimensions such as sentiment [16], descriptiveness [9], formality [22] and many more across various tasks in text generation. We describe our methodology below to incorporate such characteristics into abstractive text summarization.

The way these stylistic aspects are incorporated into the sequence-to-sequence summarization framework, depends on how these aspects are defined. For example, simplicity of text can be defined at a lexical level based on the word-frequency in a simple corpus as defined in [25]. Krishna et al. [17] extend this towards generating simple summary by modifying the decoder probability in Eq. 2. They incorporate simplicity by modifying the beam search decoder to choose contextual replacement with words that are simpler; defining simplicity as,

(4)

where is the frequency of the word in the SUBTLEX [5] corpus. Krishna et al. [17] then use a word-to-word affinity/ replacement probabilities to achieve the tailoring.

However, not all aspects can be defined at a lexical level and hence, it is not always straightforward to modify the decoder probability. For example, readability can be quantified via the Flesch reading-ease score [11] given by,

(5)

The Flesch reading-ease score quantifies the difficulty in understanding a passage written in English. Higher scores indicate easier to read passages. It posits that the readability is inversely related to the average number of words in a sentence and to the average number of syllables in a word. Using a partial form of this definition, [17] propose to use shorter words (lesser syllables) as a surrogate to reading ease and use it to modify decoder probabilities. However, ignoring the first component of the reading ease makes this an incomplete tailoring.

Accounting for the first component would require the whole sentence to be generated before providing any feedback to the model on its generated readability. We propose to use a reinforcement learning based framework to incorporate such complex objectives requiring the generation of the complete/partial output for feedback generation. Recently, reinforcement learning frameworks have been successfully used to optimize text generation for content-based metrics, e.g. Rouge scores for summarization in [26], CIDEr scores for image captioning in [28]. We extend these to propose a reinforcement loss for stylistic elements as an additional term along with cross entropy using the Self Critical Sequence Training (SCST) [28] algorithm.

Given an input article word sequence , and a corresponding ground truth summary , the pointer generator framework optimizes the negative log-likelihood objective function is given by,

(6)

For providing explicit feedback on the stylistic characteristics, two output sequences are generated at the time of training: sampled sequence and baseline sequence . We generate by sampling from the distribution at each time step, and by greedily choosing the word with maximum probability from the output distribution at each time step. The SCST algorithm defines a loss term with a reward for the target style characteristics,

(7)

where is the reward function - based on the target style characteristic to be optimized. Optimizing improves the expected reward of the generated output. The final loss is a linear combination of and given by,

(8)

where governs the strength of RL based loss term. Reinforcement learning allows the loss function to include any non-differentiable metric in the form of rewards, which can be leveraged to optimize on our complex stylistic aspects directly. The Self Critical Sequence Training approach also helps in dealing with a exposure bias [26], a limitation in teacher forcing algorithm for training recurrent neural networks. By using sampled sequences, the model is exposed to its own distribution, learning to generate in accordance with such global meta properties.

To evaluate this methodology towards incorporating such characteristics, we use our setup to improve on the readability and simplicity of the generated summaries and incorporate the corresponding metrics into the learning algorithm directly as a reward function using the Reinforcement learning based loss function . To gain more insights on appropriate methods of tailoring stylistic characteristics, we compare our RL-based approach against the pointer-generator method from [30], the use of vocabulary tokens adapted from [8] and with modifying word-to-word affinity probabilities adapted from [17].

To adapt token based approach for readability [8], we define two tokens: “Not Readable” and “Readable” based on whether the readability of the ground truth summary was less or more as compared to the median value of in the training dataset. We also evaluate against the lexical level modifications suggested by  [17] and use their VoTing method to modify the generation probabilities by promoting the generation of shorter words over their longer synonyms. Finally, for our RL-based approach, we observe that training using the reinforcement loss is extremely slow, owing to the computation of sampled and greedy sequences along with the teacher forced outputs. Hence, we use the PGen model, pre-trained on CNN-DM dataset as the initialization point, i.e. training with . Then, we train for more iterations with a fixed .

Similarly, for simplicity, we leverage the work by  [17] to measure simplicity based on Eq. 4 and establish the baselines similar to readability above. Our RL-based method directly uses this simplicity score as the reward function. For the token based approach, we divide the ground-truth summaries into two classes, “Not Simple” and “Simple” by thresholding at the median, observed to be in the training dataset.

Similar to content-specific characteristics, we use ROUGE 1, ROUGE 2 and ROUGE L F-scores to evaluate the overlap between the generated and ground-truth summaries. To evaluate whether the models are able to capture our readability definition, we report the average Flesch reading-ease score for the generated summaries. For simplicity, we report the corresponding average simplicity score. Note that our objective is to understand whether the methods are able to capture a given definition for style-specific characteristics. Therefore, we have evaluated the tailoring based on the defined target metrics itself.

Table 3 summarizes our experiments for readability and simplicity. We observe a trade-off between the use of simpler/readable words from the vocabulary and the generation quality as captured by ROUGE metric primarily because of the deviation towards more simpler or readable words from the ones in reference summaries. The proposed RL-based approach is better able to capture readability, achieving higher average scores over all other approaches. However, it is not the best model for simplicity, where the lexical modifications at the decoder beats the RL method. This suggests that where the entire sequence needs to be generated to measure the stylistic aspect (like readability) it is useful to resort to RL-based frameworks. However, when the stylistic aspect can be measured lexically, decoder modifications perform better. There exist works in the reinforcement learning literature that have explored Actor-Critic methods [2] to provide intermediate feedback to the model even before generating the complete output sequences via partial rewards. Exploring such techniques to tackle simpler to complex definitions for style-specific constraints is a topic for future work.

Also note that token-based frameworks have a mixed results since it heavily relies on the diversity of training data without any explicit signal - hence might not be suited for stylistic aspects unless the training data contains sufficient diversity. It is possible to train a joint model which can be trained using the feedback on both ground-truth summaries in the data and the sequences sampled from the output distributions in a token based framework - which is a subject of further research.

Readability Simplicity
Method ROUGE F-score Readability ROUGE F-score Simplicity
1 2 L 1 2 L
PGen [30]
Token-based [8]
VoTing [17]
RL-based
Table 3: Performance of the proposed approach in improving the readability and simplicity of generated summaries.
Article(47.18): the killing of an employee at wayne community college in goldsboro , north carolina , may have been a hate crime , authorities said tuesday . investigators are looking into the possibility , said goldsboro police sgt. jeremy sutton . he did not explain what may have made it a hate crime . the victim – ron lane , whom officials said was a longtime employee and the school ’s print shop operator – was white , as is the suspect . lane ’s relatives said he was gay , cnn affiliate wncn reported . the suspect , kenneth morgan stancil iii , worked with lane as part of a work-study program , but was let go from the program in early march due to poor attendance , college president kay albertson said tuesday . on monday , stancil walked into the print shop on the third floor of a campus building , aimed a pistol-grip shotgun and fired once , killing lane , according to sutton . stancil has tattoos on his face …
Reference(48.5): relatives of wayne community college shooting victim say he was gay , local media report . the suspect had worked for the victim but was let go , college president says . the suspect , kenneth morgan stancil iii , was found sleeping on a florida beach and arrested .
PGen(8.23): wayne community college , north carolina , may have been a hate crime , authorities say . investigators are looking into the possibility , said goldsboro police sgt. jeremy sutton . investigators are looking into the possibility , said goldsboro police sgt. jeremy sutton .
RL-based(50.12): the killing of an employee at wayne community college may have been a hate crime . the suspect , kenneth morgan stancil iii , worked with lane as part of a work-study program . he has no previous criminal record , authorities say .
Table 4: Sample output summary generated by incorporating readability as a reward function, along with baseline and reference summaries on an instance from CNN-DM-mixed dataset. The numbers in brackets refer to the corresponding readability scores. We show just the top few sentences in the input article, in the interest of space.
Article: hong kong -lrb- cnn -rrb- six people were hurt after an explosion at a controversial chemical plant in china ’s southeastern fujian province sparked a huge fire , provincial authorities told state media . the plant , located in zhangzhou city , produces paraxylene -lrb- px -rrb- , a reportedly carcinogenic chemical used in the production of polyester films and fabrics . the blast occurred at an oil storage facility monday night after an oil leak , though local media has not reported any toxic chemical spill …
Summary: …five out of six people were hurt(injured) by broken glass and have been sent to the hospital for treatment .
Article: -lrb- cnn -rrb- debates on climate change can break down fairly fast . there are those who believe that mankind ’s activities are changing the planet ’s climate , and those who do n’t . but a new way to talk about climate change is emerging , which shifts focus from impersonal discussions about greenhouse gas emissions and power plants to a very personal one : your health …
Summary: …it ’s easy to brush aside debates involving big(major) international corporations , but who would n’t stop to think and perhaps do something about their own health
Table 5: Sample simplified summaries generated by the proposed approach. Words in bold show the use of simpler summaries generated by our approach, while the words in italics are those picked up by the baseline model.

Table 4 shows the generated output summaries for the RL-based approach and PGen baseline model, on an instance from CNN-DM dataset, where our RL-based method achieves better readability scores using shorter sentence constructs. Such a framework can be used to teach the model on sentence level characteristics required to achieve a target style. Similarly the summaries generated by VoTing based approach is shown in Table 5 - the generated summaries uses simpler words in the summaries, such as ‘big’ in place of ‘major’ and ‘hurt’ in place of ‘injured’. In a similar manner, careful modifications of the generated probabilities while decoding can be used to incorporate various other stylistic aspects defined at a lexical level.

6 Conclusions

In this work, we study a variety of constraints which may be imposed while generating abstractive summaries of a given input article by categorizing these constraints as either content-specific, which govern what content needs to be generated and style-specific, which govern various stylistic expressions in these outputs. Our experiments indicate that the content-based characteristics can be tailored in the summary via explicitly or implicitly tuning the attention to focus on relevant parts of the network. Approach to tailor stylistic constraints depends on the nature of definition - characteristics defined at lexical level can be tuned better by modifying decoder probabilities during beam search. More complicated metrics can be tuned by using reinforced rewards in the loss function.

References

  1. M. Artetxe, G. Labaka, E. Agirre and K. Cho (2017) Unsupervised neural machine translation. arXiv preprint arXiv:1710.11041. Cited by: §2.
  2. D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, A. Courville and Y. Bengio (2016) An actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086. Cited by: §5.
  3. S. Banerjee, P. Mitra and K. Sugiyama (2015) Multi-document abstractive summarization using ilp based multi-sentence compression.. In IJCAI, pp. 1208–1214. Cited by: §2.
  4. T. Berg-Kirkpatrick, D. Gillick and D. Klein (2011) Jointly learning to extract and compress. In Annual Meeting of the Association for Computational Linguistics (ACL), Cited by: §2.
  5. M. Brysbaert and B. New (2009) Moving beyond kučera and francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behavior research methods 41 (4), pp. 977–990. Cited by: §5.
  6. S. Chopra, M. Auli, A. M. Rush and S. Harvard (2016) Abstractive sentence summarization with attentive recurrent neural networks.. In HLT-NAACL, pp. 93–98. Cited by: §2.
  7. J. Duchi, E. Hazan and Y. Singer (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research. Cited by: §3.
  8. A. Fan, D. Grangier and M. Auli (2017) Controllable abstractive summarization. arXiv preprint arXiv:1711.05217. Cited by: §1, §1, §2, Table 2, §4, §4, §4, §4, Table 3, §5, §5.
  9. J. Ficler and Y. Goldberg (2017) Controlling linguistic style aspects in neural language generation. In Proceedings of the Workshop on Stylistic Variation, pp. 94–104. Cited by: §2, §5.
  10. K. Filippova (2010) Multi-sentence compression: finding shortest paths in word graphs. In International Conference on Computational Linguistics (COLING), Cited by: §2.
  11. R. F. Flesch (1979) How to write plain english: a book for lawyers and consumers. Harpercollins. Cited by: §5.
  12. P. Genest and G. Lapalme (2011) Framework for abstractive summarization using text-to-text generation. In Workshop on Monolingual Text-To-Text Generation, Cited by: §2.
  13. C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou and Y. Bengio (2016) Pointing the unknown words. In Annual Meeting of the Association for Computational Linguistics (ACL), Cited by: §2.
  14. M. Han, O. Wu and Z. Niu (2017) Unsupervised automatic text style transfer using lstm. In National CCF Conference on Natural Language Processing and Chinese Computing, pp. 281–292. Cited by: §2.
  15. K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman and P. Blunsom (2015) Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems, pp. 1693–1701. Cited by: §2, §4.
  16. Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov and E. P. Xing (2017) Toward controlled generation of text. In International Conference on Machine Learning, pp. 1587–1596. Cited by: §2, §5.
  17. K. Krishna, A. Murhekar, S. Sharma and B. V. Srinivasan (2018) Vocabulary tailored summary generation. In International Conference on Computational Linguistics (COLING), Cited by: §1, §1, §2, Table 3, §5, §5, §5, §5, §5.
  18. K. Krishna and B. V. Srinivasan (2018) Generating topic-oriented summaries using neural attention. In Conference of the North American Chapter of the Association for Computational Linguistics(NAACL), Cited by: §1, §1, §2, Table 1, Table 2, §4, §4, §4, §4, §4, §4, §4.
  19. R. Nallapati, F. Zhai and B. Zhou (2017) SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents.. In AAAI, pp. 3075–3081. Cited by: §1, §2.
  20. R. Nallapati, B. Zhou, C. dos Santos, C. Gulcehre and B. Xiang (2016) Abstractive text summarization using sequence-to-sequence rnns and beyond. In SIGNLL Conference on Computational Natural Language Learning, Cited by: §2, §4.
  21. A. Nenkova and K. McKeown (2011) Automatic summarization. Foundations and Trends® in Information Retrieval. Cited by: §1, §2.
  22. X. Niu, M. Martindale and M. Carpuat (2017) A study of style in machine translation: controlling the formality of machine translation output. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2814–2819. Cited by: §1, §1, §2, §5.
  23. X. Niu, S. Rao and M. Carpuat (2018) Multi-task neural models for translating between styles within and across languages. arXiv preprint arXiv:1806.04357. Cited by: §2.
  24. S. Oraby, L. Reed, S. Tandon, T. Sharath, S. Lukin and M. Walker (2018) Controlling personality-based stylistic variation with neural natural language generators. arXiv preprint arXiv:1805.08352. Cited by: §2.
  25. G. Paetzold and L. Specia (2015) Lexenstein: a framework for lexical simplification. Proceedings of ACL-IJCNLP 2015 System Demonstrations, pp. 85–90. Cited by: §5.
  26. R. Paulus, C. Xiong and R. Socher (2018) A deep reinforced model for abstractive summarization. arXiv. Cited by: §1, §2, §5, §5.
  27. S. Prabhumoye, Y. Tsvetkov, R. Salakhutdinov and A. W. Black (2018) Style transfer through back-translation. arXiv preprint arXiv:1804.09000. Cited by: §2.
  28. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross and V. Goel (2017) Self-critical sequence training for image captioning. In CVPR, Cited by: §5.
  29. A. M. Rush, S. Chopra and J. Weston (2015) A neural attention model for abstractive sentence summarization. In Conference on Empirical Methods in Natural Language Processing, Cited by: §2.
  30. A. See, P. J. Liu and C. D. Manning (2017) Get to the point: summarization with pointer-generator networks. In Annual Meeting of the Association for Computational Linguistics (ACL), Cited by: §1, §2, §3, Table 2, §4, Table 3, §5.
  31. R. Sennrich, B. Haddow and A. Birch (2016) Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 35–40. Cited by: §2.
  32. T. Shen, T. Lei, R. Barzilay and T. Jaakkola (2017) Style transfer from non-parallel text by cross-alignment. In Advances in Neural Information Processing Systems, pp. 6833–6844. Cited by: §2, §2.
  33. I. Sutskever, O. Vinyals and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §2.
  34. S. Takase, J. Suzuki, N. Okazaki, T. Hirao and M. Nagata (2016) Neural headline generation on abstract meaning representation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1054–1059. Cited by: §2.
  35. A. Tikhonov and I. P. Yamshchikov (2018) What is wrong with style transfer for texts?. arXiv preprint arXiv:1808.04365. Cited by: §2.
  36. Z. Tu, Z. Lu, Y. Liu, X. Liu and H. Li (2016) Modeling coverage for neural machine translation. In 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Cited by: §2.
  37. L. Wang, J. Yao, Y. Tao, L. Zhong, W. Liu and Q. Du (2018) A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. arXiv preprint arXiv:1805.03616. Cited by: §1.
  38. L. Wang and C. Cardie (2013) Domain-independent abstract generation for focused meeting summarization.. In ACL (1), pp. 1395–1405. Cited by: §2.
  39. J. Xu, X. Sun, Q. Zeng, X. Ren, X. Zhang, H. Wang and W. Li (2018) Unpaired sentiment-to-sentiment translation: a cycled reinforcement learning approach. arXiv preprint arXiv:1805.05181. Cited by: §2.
  40. Y. Zhang, N. Ding and R. Soricut (2018) SHAPED: shared-private encoder-decoder for text style adaptation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Vol. 1, pp. 1528–1538. Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
402468
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description