Sentence Compression in Spanish driven by Discourse Segmentation and Language Models
Previous works demonstrated that Automatic Text Summarization (ATS) by sentences extraction may be improved using sentence compression. In this work we present a sentence compressions approach guided by level-sentence discourse segmentation and probabilistic language models (LM). The results presented here show that the proposed solution is able to generate coherent summaries with grammatical compressed sentences. The approach is simple enough to be transposed into other languages.
Sentence Compression in Spanish driven by Discourse Segmentation and Language Models
Alejandro Molina, Juan-Manuel Torres-Moreno, Iria da Cunha, Eric SanJuan and Gerardo Sierra
Laboratoire Informatique d’Avignon,
BP 91228 84911, Avignon, Cedex 09, France
École Polytechnique de Montréal,
CP. 6128 succursale Centre-ville, Montréal, Québec, Canada
Universitat Pompeu Fabra, Barcelona, Spain.
Instituto de Ingenierá, UNAM Mexico, DF.
Automatic Text Summarization (ATS) is indispensable to cope with ever increasing volumes of valuable information. An abstract is by far the most concrete and most recognized kind of text condensation [ANS:79]. Sentences extraction allows to generate summaries by extraction sentences [luhn:58, edmundson:69, torres:11].
Sentence compression can be used to improve extract summarization [knight:00, molina:linguamatica:10]. Previous works suggest that sentence segmentation could be helpful in sentence compression generation [sporleder:05].
In this work we present an new automatic sentence compression generation approach. First sentences are segmented using a discourse segmenter and then, compression candidates are generated. Finally, the best candidate i.e., the most grammatical one, is selected based on its probability as a sequence in a Language Model.
We organized the rest of the paper as follows. Firstly, in section §2 we recall the mains concepts of sentence compression. Then, we present in §3 our compression candidates generation approach. Compression candidates evaluation is introduced in §4. Experimental results are showed in §5. Finally, section §6 presents conclusions and future work.
2 Sentence compression
Sentence compression can be considered as a summarization at the sentence level. Sentence compression task is defined as follows:
“Consider an input sentence as a sequence of words . An algorithm may drop any subset of these words. The words that remain (order unchanged) form a compression” [knight:02].
There are interesting algorithms to determine the removal of words in a sentence but humans tend also to delete long phrases in an abstract [pitler:10].
Recent studies have found good results by concentrating on clauses, instead of isolated words. In [steinberger:06] an algorithm first divides sentences into clauses prior to any elimination and then, compression candidates are scored based on Latent Semantic Analysis proposed in [deerwester:90]. However, no component to mitigate grammaticality issues is included in this algorithm [steinberger:06]. Although the results of this last work are in general good, in some cases the main subject of the sentence is removed. The authors attempted to solve this issue by including features in a machine learning approach [steinberger:07].
As an alternative to clauses, some studies explore discourse structures to tackle the sentence compression task. Discourse chunking [sporleder:05] is an alternative to discourse parsing, thereby, showing a direct application to sentence compression. The authors of this last work plausibly argued that, while discourse parsing at document-level stills poses a significant challenge, sentence-level discourse chunking could represent an alternative in languages with limited full discourse parsing tools. In addition, some sentence-level discourse models have shown accuracies comparable to human performance [soricut:03].
3 Compression Candidates Generation
In this work, we use a sentence-level discourse segmentation approach. Formally, “Discourse segmentation is the process of decomposing discourse into Elementary Discourse Units (EDUs), which may be simple sentences or clauses in a complex sentence, and from which discourse trees are constructed” [tofiloski:09]. Discourse segmentation is only the first stage for discourse parsing (the others are detection of rhetorical relations and building of discourse trees). However, we can consider segmentation at the sentence level in order to identify segments to be eliminated in the sentence compression task. This decomposition of a sentence into EDUs using only local information is called shallow discourse segmentation. In [molina:micai:11], the authors use a discourse segmenter in order to segment sentences in spanish. The discourse segmenter is described in [dacunha:10] and is based in the Rhetorical Structure framework [mann:87].
We propose that compression candidates be generated by deleting some discourse segments from the original sentence. Let be a sentence the sequence of its discourse segments: . A candidate, , is a subsequence of that preserves the original order of the segments. The original sentence always form a candidate, i.e., , this is convenient because sometimes there is no shorter grammatical version of the sentence, especially in short sentences that conform one single EDU. Since we do not consider the empty subsequence as a candidate, there are candidates. Furthermore, since we rarely have more than 5 discourse segments in a sentence, usually we create between 1 and 31 candidates, this, dramatically reduces the solution space given that . The compression candidates are constructed using a binary counter. In Example 1 we show all the candidates associated to a sentence extracted from our corpus.
:[Además ella participó ese mismo año en el concierto en tributo a Freddie Mercury,][hablando acerca de la prevención necesaria][para combatir el SIDA.]111English translation: [Also she participated that year in the concert in tribute to Freddie Mercury, ][talking about prevention needed][to fight AIDS.]
:[hablando acerca de la prevención necesaria][para combatir el SIDA.]
:[Además ella participó ese mismo año en el concierto en tributo a Freddie Mercury,][para combatir el SIDA.]
:[para combatir el SIDA.]
:[Además ella participó ese mismo año en el concierto en tributo a Freddie Mercury,][hablando acerca de la prevención necesaria]
:[hablando acerca de la prevención necesaria]
:[Además ella participó ese mismo año en el concierto en tributo a Freddie Mercury,]
4 Compression Candidates Scoring with Language Model
A Language Model (LM) estimates the probability distribution of natural language. Statistical language modeling [chen:99, manning:99] is a technique widely used to assign a probability to a sequence of words. We assume that good compression candidates must have a high probability as sequences in a LM. In general, for a sentence , the probability of is:
Where . The probabilities in a LM are estimated counting sequences from a corpus. Even though we will never be able to get enough data to compute the statistics for all possible sentences, we can base our estimations using big corpora and interpolation methods. In our experiments we use a big corpus with 1T words222LDC Catalog No.: LDC2009T25 ISBN: 1-58563-525-1 to get the sequences counts and a LM interpolation based on Jelinek-Mercer smoothing [chen:99]:
In equation (4) the maximum likelihood estimate of a sequence is interpolated with the smoothed lower-order distribution.
For a given candidate,
we assign a LM based score () based on its probability as in equation (4). For our experiments we use the Language Modeling Toolkit SRILM333Avaliable at http://www-speech.sri.com/projects/srilm/download.html [stolcke:02].
5 Experimental Results
For the experiments, two annotators were required to compress each sentence following the instructions in [molina:micai:11]. The corpus contains four sub-corpora: Wikipedia sections, brief news, scientific abstracts and short stories. Each sub-corpus has 20 texts composed of no more than 50 phrases each one (1939 tokens). We have randomly selected eight documents for evaluation, two of each sub-corpus.
We have generated abstract summaries selecting the best compression candidate of each sentence considering two different approaches:
All system: Selecting the best scored candidate for each sentence.
First system: Selecting the best candidate from those that include the first segment.
For comparison we have created Random system: a baseline system which applies a random compression. Random system eliminates some words of a given sentence at the same rate of human annotators.
After compressions, three judges (different from annotators) read the eight summaries. The judges do not know the source of the final summary. They mark each sentence in the final summary if they found it grammatically incorrect in the context. In addition, they evaluate the global coherence of summaries after compressions. Coherence of summaries is scored with a categorical variable: a value of -1 is assigned for incoherent summaries, 0 if some coherences are found and +1 for coherent productions. The Compression Rate (CR) is defined as the proportion of content eliminated from the original document. It says how much of the content was eliminated.
In Table 1, we compare our systems and the two human annotators. The results confirm that scoring compressions before producing a summary improves the text quality in summaries with compressed sentences. It is very surprising that, for Human, the number of compressions judged grammatically incorrect is greater than that of our systems. May be Human misunderstood the compression instructions. However, considering as limits the results of Human and Random baseline system we consider that the proportion of bad formed sentences is very low. We confirmed our initial intuition that preserving the first segment tends to save the main subject in most of the cases. The introduction of this simple heuristic in the First system improves the grammar quality of productions.
Table 2 shows the result of comparing our systems using the two human summaries as references. We wanted to evaluate the content quality of summaries with the ROUGE package [lin:2003]. ROUGE is used to evaluate summaries because some results show that it correlates well with human judgements [lin:2004rpa]. Results in Table 2 are opposite to what was expected. We assumed that Random system would have worst results with respect to First system. Looking at the judgments of coherence and grammar, made by humans, in Table 1, we expected the same positions of the systems. However, in Table 2 we see that the best value of ROUGE, using human-made summaries with compressed sentences as references, is for the Random system. Other than that, again, First system overcomes All system.
As an alternative, we compare the divergence of texts with respect to the original uncompressed text using the FRESA package444 http://lia.univ-avignon.fr/fileadmin/axes/TALNE [torres:10c, saggion:10, torres:10poli]. The FRESA score F asses the summaries qualities. Lower values of F means significant difference whit respect to the original text (i.e. more radical compressions). The results of divergence tests of summaries is showed in Table 3. Results in Table 3 are interesting. Considering F values, we see that First system is found more related to Human than Human by FRESA and Human is closer to Random system performance. These values are congruent with the coherence and grammar judgments showed in Table 1. The F value of All system suggests that it is the most aggressive approach.
For all tables we use the following notation about the sources: All system=all compression candidates, First system=candidates including the first segment, Random system=random compression (baseline), Human=human compressions.
6 Conclusions and future work
In this work we have introduced the concept of Sentence Compression driven by Discourse Segmentation and Language Models. We have found that using Probabilistic Language Models can be helpful for evaluation of compressions candidates. The results in Spanish presented in this paper are very encouraging. We believe that this approach is independent enough of the language to be transposed into other languages such as English or French. In future work we aim to improve the score (4) adding content restrictions.
Evaluation of compressed sentences and summaries with compressions is still a challenge in languages other than English that do not have reference corpora. We think that more studies are necessary in order to evaluate if ROUGE or FRESA are good methods for compressed text evaluations.
This work was partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT) México, grant number 211963.