Sam: Semantic Attribute Modulation for Language Modeling and Style Variation
This paper presents a Semantic Attribute Modulation (SAM) for language modeling and style variation. The semantic attribute modulation includes various document attributes, such as titles, authors, and document categories. We consider two types of attributes, (title attributes and category attributes), and a flexible attribute selection scheme by automatically scoring them via an attribute attention mechanism. The semantic attributes are embedded into the hidden semantic space as the generation inputs. With the attributes properly harnessed, our proposed SAM can generate interpretable texts with regard to the input attributes. Qualitative analysis, including word semantic analysis and attention values, shows the interpretability of SAM. On several typical text datasets, we empirically demonstrate the superiority of the Semantic Attribute Modulated language model with different combinations of document attributes. Moreover, we present a style variation for the lyric generation using SAM, which shows a strong connection between the style variation and the semantic attributes.
Language generation is considered as a key task in the artificial intelligence field . The language modeling task aims to present the word distributions of text sequences and is considered as a degenerated text generation task, which generates only one word at each step. Traditional language generation approaches use phrase templates and related generation rules. For the language modeling task, the counting-based n-gram method is broadly used. These methods are conceptually simple but hard to generalize like humans.
Later on, Bengio et al. developed a feed-forward neural network language model and Mikolov et al.  used the recurrent neural network (RNN) to train a language model. With the benefits of the large-scale corpora and the modified gating functions, such as the long-short term memory (LSTM)  or the gated recurrent unit (GRU) , the recurrent neural network (RNN) has been demonstrated a good capability in modeling word probabilities and now is the most widely used method for language modeling and language generation . Nevertheless, RNN is often criticized for incapable of capturing the long-term dependency, resulting in losing the important contextual information. It has been shown that the RNN language models (RNNLMs) can be enhanced with some specific long-term contextual information, including document topics , bag-of-words contexts , a neural cache , etc. Several specific text structure was considered in the RNNLMs, such as the hierarchical sentence sequences , tree-structured texts  and dialog contexts .
In the aforementioned models, only main text sequences were modeled but the vastly-accessible attributes of documents were ignored. Interestingly, the document attributes implicitly convey global contextual information of the word distributions and are vastly-accessible before reading the main texts in daily reading or speaking. Document titles are compact abstracts carefully chosen by authors and keynote speakers. Labels and tags are specific categories assigned by experienced editors. Authorships reflect writing styles. With these vastly-accessible attributes, one can predict word distributions better (see a concrete example in Figure. ?).
Moreover, from the generation perspective, several previous works generate the designed outputs from scratch or from a single semantic attribute . However, only a few semantic attributes were incorporated at the same time and were incapable to meet the huge complexity of the text generation task. In this paper, we consider a diversity of semantic attributes and use the attention mechanism to conjoin the semantic attribute as a joint embedding. Hence, the semantic attribute modulation brings a flexible way to generate texts because we can choose different combinations of these attributes. Due to the strong semantic information conveyed by the attributes, the text generations are interpretable with regard to the different combinations of the input attributes. With this flexibility, we can get a text style variation with replacements of semantic attributes. An interesting example is Please let Jason Mraz rewrite the lyric ‘Last Kiss’
However, most of these work consider a document as a sequence of sentences where each sentence is, in fact, a sequence of words or characters.
However, titles and tags... see example Figure. ?
why not more about our work... Another interpretation of context...
In this paper, we present SAM, the Semantic Attribute Modulation for language modeling and style variation. We consider the vastly-accessible semantic language attributes and extract the attribute embedding. Specifically, we adopt two types of semantic attributes: the title attribute and the category attribute. For the title attribute, we use an RNN encoder to get the title embedding. For the category embedding, our model learns a shared embedding from the documents in the specific category. Then, we generate the outputs with an attention mechanism over a diversity of attribute embeddings.
The semantic attribute modulated (SAM) language model obtains better per-word prediction results than the vanilla RNNLM without SAM. The improved word predictions are highly related to the semantic attributes and therefore interpretable to humans. Moreover, we present the lyric generation task with lyric variation derived from semantic attributes. The text generation conditioned on the semantic attribute has a flexible attribute selection. With a learned attribute as a replaced input, we can get the output with the style variation. Interesting lyric style variations examples further demonstrate the flexibility of SAM.
In summary, our contributions are as follows:
We present SAM, a Semantic Attribute Modulation, which incorporates a diversity of semantic document attributes, as a flexible language generation modulation input.
By incorporating the Semantic Attribute Modulation, our language model gets better word prediction results on several text datasets. The better word predictions are highly related to the semantic attribute and hence is interpretable to humans.
Based on our model, we present the stylistic variations of the lyric generation with a fake author attribute, which further demonstrates the flexibility of SAM.
In this section, we first give a concrete example of semantic attributes and then list the related language generation models.
2.1A concrete example of semantic attributes
We take an AlphaGo news article from the New York Times as a concrete example (Figure. ?). Given the title ‘Google’s Computer Program Beats Lee Se-dol in Go Tournament’, the main text words ‘Google’, ‘program’ and ‘Go’ could be predicted more easily. Given the author attribute ‘CHOE SANG-HUN’ who is a Pulitzer Prize-winning South Korean journalist, we can better predict the words ‘South-Korean’ and ‘Go’. That is to say, the semantic attributes are indicative of different aspects of the text generation, which motivate us to modularize the semantic attributes in the text generation models.
Given a sequence of words , language modeling aims at computing its probability by
where are the words ahead of . We can use the recurrent neural network to build the word probabilities . At each time step , the transition gating function reads one word and updates the hidden state as , where is the continuous vector representation of the one hot input vector and is the embedding matrix. The probability of the next possible word in the vocabulary is computed by
where , are the affine weights and biases respectively and is the dimension of the hidden state . Here the subscription specifies the specific column.
The RNN models were always criticized for their lacking capacity of the long-term sequential dependence, resulting in an unsatisfactory performance on modeling contextual information. Several previous works tried to capture the contextual information using the previous contexts. Let be the contextual representation extracted from the contexts and the generation process of the RNNLM with is
2.3Neural Language Generation
3Semantic Attribute Modulation
Other than main texts, documents have semantic attributes, such as titles, authorships, tags, and sentiments, which convey important semantic information. In this section, we present SAM, the Semantic Attribute Modulation originated from an attention mechanism over a diversity of attributes. Then, we use SAM to do language modeling and style variation for language generation. Given the semantic attribute modulated representation , the generative process of our model is , where are the words in the same document.
Due to the discrepant forms among the semantic attributes, we use two methods to extract the representations from semantic attributes.
The title is often carefully chosen by the author and is a compact abstract of a document. Given an -length title sequence , we use a recurrent neural network to extract the hidden state of every title word as
where the dimension of the title word hidden state is . Since the title words do not have equal contribution to the whole context embedding, we use an attention mechanism for the title attribute, and obtain the different title representation for different main text words as a weighted sum:
where is the attention value of the title word for the main text word ,
is the hidden state of the previous time step in the main text and is an attention function which scores how the title word affects the main text word :
With this title attention, we automatically learn different importance weights of the title words for each main text word.
Category attributes are commonly used in daily writing and speaking. Useful category attributes include document categories, authorships, sentiments, etc. We formulate the category attribute as a one hot vector and the embedding of the category attribute is counted via an encoder of the one hot vector
where is a weight matrix which maps the one hot vector to a continuous category embedding. We use the same embedding dimension for category attributes with the dimension of the title embedding as .
3.2Language Generation and Style Variation with Sam
With the above semantic embedding extractions, we obtain a set of semantic attribute embeddings . To leverage the importance of each attribute for a main content word , we adopt another semantic attribute attention mechanism to learn the semantic attribute embedding for different main text words as
where is an attention function which scores how the attribute affects the main text word .
We incorporate the obtained semantic attributes into the RNN framework. By using an attribute attention mechanism, the transition of RNN hidden state reads not only the current word but also the semantic attribute embedding . Specifically,we concatenate the semantic attribute embedding and the input word embedding vector . Thus, the hidden states update as:
For the recurrent neural network function , we use the gated recurrent unit (GRU) . The GRU cell has two gates and a single memory cell. They are updated as:
where is the sigmoid function and is the Hadamard product. Our model is trained by maximizing the log-likelihood of the corpus, using the back-propagation through time (BPTT) method .
As can be seen in Figure. ?, we build a Semantic attribute Modulated language generation model. Semantic attributes can be considered as the inputs for the designed generation outputs. By comparing the semantic attributes, the corresponding outputs are interpretable to users. Moreover, considering that some attributes reflect the text styles, we realize the text style variation by replacing with some other related attributes. We will give some generated variations of the typical lyrics in the experiment part.
4Discussions and Related Work
Neural Machine Translation
Neural machine translation (NMT) uses the encoder-decoder network to generate specific response . In NMT, the encoder network reads some source texts of one language and encodes them into continuous embeddings. Then the decoder network translates them into another language. NMT is also used to generate some poems after encoding some keywords . This is similar to our work as generating some texts given some useful attributes. The difference from them is that our work uses a semantic attribute attention modulation to extract the semantic embedding instead of an encoder-decoder framework.
Our work is related to several contextual language modeling works. In , the titles and the keywords were represented as bag-of-words and used it to build a conditional RNNLM model. But this work only involved text attributes but could not model the discrete attributes. Discrete attributes, such as review rates and document categories, were also used to control the content generation . The variational auto-encoder based model with a generator-discriminator scheme was also used for generating controllable texts  but the input attributes are limited to be only discrete categories.
There are several major advantages of our paper over the above methods. First, we adopt a more diverse attribute set, including the widely used category attributes. The semantic information brings the interpretability of SAM. Second, we use better attribute representation method, including a semantic attention mechanism and we can get flexibility with the attention mechanism. Third, by replacing the semantic attributes, our model realize the style variation for the lyric generation.
|Words in the documents of the politics category|
|Improved||to, be, ireland, bush, one, chairman, fiscal, week, in, or, plan|
|Alike||both, general, many, both, but, is, N, in, been, the, said|
|Worse||of, gm, stock, orders, law, jerry|
|Words in the documents of the finance category|
|Improved||exchange, share, group, third-quarter, soared, from, is, profit|
|Alike||N,of, days, had, than, month, share, were, yield|
|Worse||reported, analysis, all, yield, vehicles, economics, gm, currently|
External Knowledge for Language Modeling
Some useful side information is external, such as the knowledge base and some text references [?]. Compared with these methods, our model concentrates on the document itself and does not depend on the external knowledge.
In this section, we first show that the Semantic Attribute Modulated language model gets better word predictions. The extensive qualitative analyses demonstrate the interpretability of the word predictions with regard to the input attributes. We then give several examples of the lyric style variation with SAM, which shows the flexibility of SAM.
We evaluate the proposed language model with semantic attribute attention on five different datasets with the different attribute combinations. Among these datasets, TTNews, XLyrics and the titles of IMDB are collected by ourselves. We have the future plans to release the collected corpora after resolving the copyright issues. For detailed statistics, see Table. Table 1.
Penn TreeBank (PTB)
Penn TreeBank (PTB) is a commonly-used dataset for evaluating language models and its texts are derived from the Wall Street Journal (WSJ). We use the preprocessed corpus by  and it has 929k training tokens with a vocabulary of size 10k
WikiText2 is made of selected Wikipedia articles and has 2 million training tokens and a vocabulary size of 33k [?]. We also use the topic model to extract a category attribute for each document with the topic number as . We also put the semantic analysis of this category attribute in Appendix A.
BBCNews is a formal English news dataset and contains 2,225 BBC news articles collected by 
IMDB Movie Reviews (IMDB)
IMDB Movie Reviews (IMDB) is a movie review corpus  and has 75k training reviews and 25k testing reviews
TTNews is a Chinese news dataset crawled from the several major Chinese media
XLyrics is a Chinese pop music lyric dataset crawled from the web. XLyrics has 4k lyrics, about 118k tokens and a vocabulary of size 3k
We consider several variants of the proposed methods with different combinations of semantic attributes. In detail, we consider the language modeling with a) a category attribute, b) a title attribute and c) a title attribute plus a category attribute. In order to realize the style variation of the generations, we consider generating lyrics with an original title attribute and a fake author attribute.
We train a recurrent language model without any side information as a baseline method. We also report the results of a count-based -gram model with the Kneser-Ney smooth method .
For training, we use the ADAM method with the initial learning rate of 0.001  to maximize the log-likelihood and use early-stop method based on the validation log-likelihood. The dimension of word embedding is set to be the same as the hidden size of RNN. The detailed parameter settings for each dataset are listed in Table. Table 1.
Main Texts Only
5.3Language Modeling Word predictions
We first show that the Semantic Attribute Modulated language model gets better word predictions. Then we give some qualitative analysis to show the interpretability of SAM.
Language Modeling with Category-Attribute
Document categories are indicative of the discoursed topics and therefore of the distribution over words. We first consider applying language modeling with category attribute on two corpora, PTB and BBCNews. For the PTB dataset, we use the LDA topic model to analyze the semantic information and we set the category as the topic which has the largest weight in LDA for every document. The details of the PTB dataset pre-processing can be seen in Appendix A. For the BBCNews dataset, we use the news category labels provided as a discrete category attribute.
In Table. Table 3, 5-Gram represents the count-based 5-gram model , RNN represents the conventional RNN model without any semantic attribute and SAM-Cat is our SAM model with a category attribute. As can be seen in the results, by adding a semantic category attribute, SAM-Cat outperforms the baseline models by achieving lower perplexities.
Language Modeling with Title-Attribute
Document titles are carefully chosen by the authors to summarize the document contents and attract the attention of readers. In this part, we incorporate the title attribute to take advantage of the implicit word distribution represented by the title. We use four corpora for this task. BBCNews and TTNews are two formal published corpora, IMDB is a movie review corpus and XLyrics is a lyric corpus.
We implement the 5-gram model and the conventional RNN model on the corpus without titles. RNN-State is the conventional RNNLM model with the title’s last hidden state as initialization. This means the title is considered as the first sentence but is not included in the prediction of per-word perplexities. RNN-BOW is the conventional RNNLM model incorporated with a bag-of-words representation of the title at each time step, which is a re-implementation of . The SAM-Title-Att method is the SAM model with the title attribute and the attention mechanism. By adding the title’s last hidden state to SAM-Title-Att as initialization, we get the SAM-Title-Att-State method.
We show the word prediction perplexity results in Table. ?. The RNN-based models, with the title embedding, has better perplexity results. Moreover, SAM-Title is better than RNN-state because the added title information would disappear after several nonlinear gating functions. The attention-based title attribute performs better than the one without attention. This is because the attention mechanism provides the different importance weights for the title words.
Generally, our SAM model with title attribute performs better on BBCNews, compared with IMDB. We believe the result is caused by the different genres of these datasets. In order to make our title attribute useful, titles should be able to convey refined summaries of documents. BBCNews, as a formal news corpus written by professional journalists, usually has titles with higher quality than IMDB corpus.
Language Modeling with Title-Author-Attribute
In this part, we incorporate two different attributes, title, and author. We will demonstrate that these two attributes are complementary.
We use the semantic attribute attention to conjoin the two attributes and the suffix ‘Au’ means that this method incorporates the author categorical attribute and maintains the method notations used in the previous part. We show the word prediction perplexity results of several attribute combinations in Table. ?. For the TTNews and XLyrics datasets, we can see that incorporating both title and author attributes are better than the single one.
Qualitative Analysis on Interpretability of Sam
In order to discover why SAM-Cat outperforms traditional methods for the PTB dataset in Table. Table 3, we demonstrate the words in the each category with the largest and the least perplexity changes in Table. Table 2. We mark the words which have a strong semantic information of the each specific category in bold. For example for the politics category, after adding the category attribute, the words, which have the largest prediction improvement, are generally related to the politics, such as ‘Ireland’, ‘bush’ and ‘chairman’. The words, which have the largest prediction degeneration, generally have a semantic meaning but not related to the politics, such as ‘gm’, ‘stock’ and ‘orders’. The words, which have the least word prediction change, are generally function words, such as ‘both’, ‘many’ and ‘but’. The word prediction changes in other categories are similar with the politics category. We put the results of the finance category in Table. Table 2 and show the results of other categories in Appendix. B due to the space limit.
To further investigate how attention values control the importance weights of the attributes, we visualize some of the attention values in Figure. ?. The color depth shows the attention weights. The red rectangles show the title word ‘Microsoft’ has a large effect on the content words ‘software’ and ‘unauthorized’. The title word ‘move’ has a large effect on the content word ‘prove’. This example shows that the attention mechanism works as a flexible selection of the attributes.
5.4Flexible Style Variation with Sam
Many downstream applications of the language modeling can be enhanced with the proposed semantic attributes. For machine translation, the semantic attributes could also be titles, authors, and categories. For the speech recognition task, the semantic attributes include the age and the dialect of the speaker. For language generation tasks, such as the question-answering and the poem/lyric generation, the possible attributes are titles, authors, and even styles.
We use the SAM model to perform lyric generation based and use both the title and author attributes. Given an original lyric, we generate a new one with the same title but a fake author. We get several amazing generation results and the differences between two are highly related to the title attribute. Here we give two concrete examples (one in Chinese and the other in English) and left more examples in Appendix. C.
For the English example in Fig. ?: The original lyric last kiss is a popular song by Taylor Swift which is of the pop country style. After changing the authorship to Jason Mraz, we generate a new love song which looks likes a rock lyric. The styles of the two lyrics tally the styles of the two singers.
For the Chinese example in Fig. ?: The original lyric Your Face is a sentimental love song written by Xiaotian Wang which is recalling the past love. After changing the authorship to Lovely Sisters, a trending Chinese band, we generate a joyful love song about the happiness of falling in love.
In this paper, we propose SAM, the semantic attribute modulation for language modeling and style variation. The main idea is to take advantage of vastly-accessible and meaningful attributes to generate interpretable texts. Our model adopts a diversity of semantic attributes including titles, authors, and categories. With the attention mechanism, our model automatically scores the attributes in a flexible way and embeds the attribute representations into the hidden feature spaces as the generation model inputs. The diversity of the input attributes make the model more powerful and interpretable and the semantic attribute attention mechanism brings flexibility for the whole model. Extensive experimental demonstrates the effectiveness and the interpretability of our flexible Semantic Attribute Modulated language generation model.
In the future, we are interested in exploring more attributes which have semantic meaning for the language model task. In addition to the lyric generation task, other language generation tasks can also use our SAM model to utilize more semantic attributes. One possible example is to incorporate the geographic position attribute into the speech recognition task to model the dialects. Applications:
Document-level machine translation
SAM: Semantic Attribute Modulation for
Language Modeling and Style Variation
Appendix A: Data Preparation of PTB
PTB is a commonly-used corpus benchmark for the language modeling task. We use the LDA topic model to extract semantic category attributes. Actually, adding a pseudo-category seems to be subtle for the language modeling task to see the words in advance and then predict them. We argue that the pseudo-category makes sense in the language modeling task evaluation for the following two reasons. First, We only add one discrete assignment for each document and there’s no straightforward word distribution information propagated. Second, in fact, the category assignments have strong semantic information and we can find real category assignments for other datasets. The semantic analysis is as follows.
For the PTB dataset, we set the topic number as and set the largest topic weight assignment as each document’s category assignment. As can be seen in Table. Table 5, the topic focuses on the corporate finance, the topic focuses on the politics, the topic focuses on the managers, the topic focuses on the stocking market and the topic focuses on the daily news.
For the WikiText2 dataset, we set the topic number as and set the largest topic weight assignment as each document’s category assignment. As can be seen in Table. Table 5, the topic focuses on the sport games, the topic focuses on the nation, the topic focuses on the arts, the topic focuses on the disasters and the topic focuses on the daily words.
|0||million billion share year company cents stock sales income revenue bonds profit corp.|
|1||its mr. federal company u.s. new government state court plan officials bill house|
|2||market stock trading prices stocks investors new price big index friday rates markets traders|
|3||its company mr. inc. new co. corp. president chief executive says group chairman business vice|
|4||mr. says when people years new time president work first few think good want city know back|
|0||game season team against won player match second League year club final|
|1||city government state part area river along build national local century US|
|2||she song film album series episode video character wrote|
|3||storm British tropical ship force German attack French British Australia June force|
|4||may used found known This species century use years large often no form common death life|
Appendix B: More word predictions of the SAM-Cat on PTB dataset
In this part, we show some more word generations of our SAM-Cat model on the PTB dataset. We show that after adding the category attribute, we get more semantic word prediction improvements. In Appendix B, we show the results on the categories ‘stock’ and ‘managements’ in Table. Table 7. We mark the words which have a strong semantic information of the each specific category in bold. After adding the category attribute, the words, which have the largest prediction improvement, are generally related to the category information. The words, which have the largest prediction degeneration, are generally have a semantic meaning but not related to the category information. The words, which have the least word prediction change, are generally function words.
|Words in the documents of the stocking category|
|Improved||co, operating, an, markets, considered, commercial, stake|
|Alike||N, usa, the is, these, discussion, at, the, chicken|
|Worse||offering, million, money, read, communications, lines, issues, city|
|Words in the documents of the management category|
|Improved||market,about,results,orders, trading, dow, portfolio, price, market|
|Alike||N, likely, of, prepared, southeast, futures, see, group, the|
|Worse||bear, totaled, optimistic, executive, chief, manufacturers, about|
Appendix C: More Lyric Generation Variations
In this part, we give several lyric generation examples, two in English and one in Chinese. We observe that if the two authors have different style contents in the training data, the generation would very possibly be with different styles. In the following figures, we give the detailed generation and the corresponding analyses in the figure captions.
- A famous song by Taylor Swift.
- http://ai.stanford.edu/˜amaas/data/ sentiment/
- http://www.ywnews.cn/, http://www.toutiao.com, http://www.huanqiu.com/, etc
- A neural probabilistic language model.
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. Journal of machine learning research, 3(Feb):1137–1155, 2003.
- A guide to recurrent neural networks and backpropagation.
Mikael Boden. the Dallas project, 2002.
- An empirical study of smoothing techniques for language modeling.
Stanley F Chen and Joshua Goodman. In ACL, pages 310–318, 1996.
- Empirical evaluation of gated recurrent neural networks on sequence modeling.
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. arXiv preprint arXiv:1412.3555, 2014.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation.
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. arXiv preprint arXiv:1406.1078, 2014.
- Topicrnn: A recurrent neural network with long-range semantic dependency.
Adji B Dieng, Chong Wang, Jianfeng Gao, and John Paisley. In ICLR, 2017.
- Improving neural language models with a continuous cache.
Edouard Grave, Armand Joulin, and Nicolas Usunier. In ICLR, 2017.
- Generating sequences with recurrent neural networks.
Alex Graves. arXiv preprint arXiv:1308.0850, 2013.
- Integrating topics and syntax.
Thomas L Griffiths, Mark Steyvers, David M Blei, and Joshua B Tenenbaum. In NIPS, 2004.
- Contextual lstm (CLSTM) models for large scale nlp tasks.
Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, and Larry Heck. arXiv preprint arXiv:1602.06291, 2016.
- Incorporating side information into recurrent neural network language models.
Cong Duy Vu Hoang, Trevor Cohn, and Gholamreza Haffari. In Proceedings of NAACL-HLT, pages 1250–1255, 2016.
- Kenlm: Faster and smaller language model queries.
Kenneth Heafield. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 187–197. Association for Computational Linguistics, 2011.
- Long short-term memory.
Sepp Hochreiter and Jürgen Schmidhuber. Neural computation, 9(8):1735–1780, 1997.
- Toward controlled generation of text.
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric Xing. In ICML, 2017.
- Document context language models.
Yangfeng Ji, Trevor Cohn, Lingpeng Kong, Chris Dyer, and Jacob Eisenstein. arXiv preprint arXiv:1511.03962, 2015.
- Adam: A method for stochastic optimization.
Diederik Kingma and Jimmy Ba. arXiv preprint arXiv:1412.6980, 2014.
- Globally coherent text generation with neural checklist models.
Chloé Kiddon, Luke Zettlemoyer, and Yejin Choi. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 329–339, 2016.
- Neural text generation from structured data with application to the biography domain.
Rémi Lebret, David Grangier, and Michael Auli. arXiv preprint arXiv:1603.07771, 2016.
- Dialog context language modeling with recurrent neural networks.
Bing Liu and Ian Lane. In ICASSP, 2017.
- Hierarchical recurrent neural network for document modeling.
Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. In EMNLP, pages 899–907, 2015.
- Capturing meaning in product reviews with character-level generative text models.
Zachary C Lipton, Sharad Vikram, and Julian McAuley. arXiv preprint arXiv:1511.03683, 2015.
- Coherent dialogue with attention-based language models.
Hongyuan Mei, Mohit Bansal, and Matthew R Walter. In AAAI, 2017.
- Learning word vectors for sentiment analysis.
Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. In ACL, 2011.
- Recurrent neural network based language model.
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. In Interspeech, volume 2, page 3, 2010.
- Extensions of recurrent neural network language model.
Tomáš Mikolov, Stefan Kombrink, Lukáš Burget, Jan Černocky, and Sanjeev Khudanpur. In ICASSP, pages 5528–5531. IEEE, 2011.
- Context dependent recurrent neural network language model.
Tomas Mikolov and Geoffrey Zweig. In SLT, pages 234–239, 2012.
- An empirical analysis of formality in online communication.
Ellie Pavlick and Joel Tetreault. Transactions of the Association for Computational Linguistics, 4:61–74, 2016.
- Building natural language generation systems.
Ehud Reiter and Robert Dale. Cambridge university press, 2000.
- Learning to generate reviews and discovering sentiment.
Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. arXiv preprint arXiv:1704.01444, 2017.
- Controlling politeness in neural machine translation via side constraints.
Rico Sennrich, Barry Haddow, and Alexandra Birch. In HLT-NAACL, pages 35–40, 2016.
- Context-aware natural language generation with recurrent neural networks.
Jian Tang, Yifan Yang, Sam Carton, Ming Zhang, and Qiaozhu Mei. arXiv preprint arXiv:1611.09900, 2016.
- Inter-document contextual language model.
Quan Hung Tran, Ingrid Zukerman, and Gholamreza Haffari. In Proceedings of NAACL-HLT, pages 762–766, 2016.
- Larger-context language modelling with recurrent neural network.
Tian Wang and Kyunghyun Cho. In ACL, 2016.
- Chinese poetry generation with planning based neural network.
Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, and Enhong Chen. In COLING, 2016.