Style-aware Neural Model with Application in Authorship Attribution

# Style-aware Neural Model with Application in Authorship Attribution

University of Central Florida
fereshteh.jafari@knights.ucf.edu
Kien A. Hua Computer Science Department
University of Central Florida
kienhua@cs.ucf.edu
###### Abstract

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

Style-aware Neural Model, Syntax encoding, Authorship Attribution

## I Introduction

Individuals express their thoughts in different ways due to many factors including the conventions of the language, educational background, and the intended audience, etc. In written language, the combination of consistent conscious or unconscious decisions in language production, known as writing style, has been studied widely [12, 15]. Stylistic features are generally content-independent. They are consistent across different documents written by a specific author (or author groups). Lexical, syntactic, and structural features are three main families of stylistic features. Lexical features represent author’s character and word use preferences, while syntactic features capture the syntactic patterns of the sentences in a document. Structural features reveal information about how an author organizes the sentences in a document.

To date, the existing approaches in the domain of authorship attribution fall into two categories. The first category adopts traditional machine learning techniques to identify the author of a given document. In this approach the stylistic features are engineered and extracted from the documents and are subsequently used as the inputs of different kind of classifiers [27, 7, 31, 37, 19, 28]. These features reveal statistical information of documents in lexical, syntactic, and structural levels. For instance, frequency of certain words, character distribution, function word distribution, frequency of part of speech tags, the number of sentences per paragraph, etc. A limitation of this approach is that the feature extracting process ignores rich sequential information in the sentences and the document.

The second category of authorship attribution approach builds upon neural network models [29, 8, 10]. In this approach, the sequence of words or characters are the input of a neural network which makes the proposed models utilize the sequential information. However, the proposed models in the literature mainly focus on lexical features despite the fact that lexical-based language models have very limited scalability when dealing with datasets containing diverse topics. On the other hand, syntactic models which are content-independent are more robust against topic variance. Zhang et. al. [40] introduces a strategy to incorporate syntactic information of documents in authorship attribution task. They propose a novel scheme to encode a syntax tree into a learnable distributed representation, and then integrate the syntax representation into a Convolutional Neural Network (CNN)-based model. Different from their approach, we are interested in a neural model which encodes the syntactic information without being equipped with explicit structural representation such as syntax parse tree. This is achieved by introducing a strategy to encode syntactic information of sentences using only their Part of Speech (POS) tags. Furthermore, our motivation is to develop a neural model which preserves all the stylistic information of documents from all three levels of language production including lexical, syntactic, and structural.

Our contribution in this paper is twofold. First, encoding syntactic information of sentences using only their part of speech tags is more computation efficient and gives better results. Second, we employ a hierarchical neural network to encode the structural information of documents. This further enhances the performance of the proposed technique. In the proposed model, we use lexical and syntactic embeddings to build two different sentence representations. Subsequently, the lexical and syntactic representations of sentences are independently fed into two parallel hierarchical neural networks to capture semantic and syntactic structure of sentences in documents. The hierarchical attention networks captures the hierarchical structure of documents by constructing representation of sentences and aggregating them into document representations [39]. We employ convolutional layers as the word-level encoder to represent each sentence by its important lexical and syntactic n-grams independent of their position in the sentence. For sentence-level encoder, we employ an attention-based recurrent neural network to capture the structural patterns of sentences in the document. The primary reason for adopting recurrent architecture for sentence-encoder is because recurrent neural networks have been shown to be essential for capturing the underlying hierarchical structure of sequential data [35]. Hence, sentence-encoder in the proposed model is expected to capture the structural information of documents. The final document representation is constructed by summing up all the learned sentence vectors while rewarding the sentence which contribute more to the predictions. Ultimately, lexical and syntactic representations are fused and fed into a softmax classifier to predict the probability distribution over the class labels.

## Ii Related Work

Style-based text classification was introduced by Argamon-Engelson et al. [2]. The authors used basic stylistic features (the frequency of function words and part-of-speech trigrams) to classify news documents based on the corresponding publisher (newspaper or magazine) as well as text genre (editorial or news item). Nowadays, computational stylometry has a wide range of applications in literary science [11, 36], forensics [6, 1, 38], and psycholinguistics [16, 17].

Syntactic n-grams are shown to achieve promising results in different stylometric tasks including author profiling [20] and author verification [13]. In particular, Raghahvan et al. investigated the use of syntactic information by proposing a probabilistic context-free grammar for the authorship attribution purpose, and used it as a language model for classification [21]. A combination of lexical and syntactic features has also been shown to enhance the model performance. Sundararajan et al. argue that, although syntax can be helpful for cross-genre authorship attribution, combining syntax and lexical information can further boost the performance for cross-topic attribution and single-domain attribution [33]. Further studies which combine lexical and syntactic features include [30, 26, 14].

With recent advances in deep learning, there exists a large body of work in the literature which employs deep neural networks in authorship attribution domain. For instance, Ge et al. used a feed forward neural network language model on an authorship attribution task. The output achieves promising results compared to the n-gram baseline [9]. Bagnall et al. have employed a recurrent neural network with a shared recurrent state which outperforms other proposed methods in PAN 2015 task [3].

Shrestha et al. applied CNN based on character n-gram to identify the authors of tweets. Given that each tweet is short in nature, their approach shows that a sequence of character n-grams as the result of CNN allows the architecture to capture the character-level interactions, which can then be aggregated to learn higher-level patterns for modeling the style [29]. Hitchler et al. propose a CNN based on pre-trained embedding word vector concatenated with one hot encoding of POS tags; however, they have not shown any ablation study to report the contribution of POS tags on the final performance results [10]. Zhang et.al introduces a syntax encoding approach using convolutional neural networks which combines with a lexical models, and applies it to the domain of authorship attribution [40]. Their proposed approach utilized syntax parse tree of sentences; however, we show in this paper that such an explicit annotation of hierarchical syntax is not necessary for the authorship attribution task. We propose a simpler yet more effective way of encoding syntactic information of documents for the domain of authorship attribution. Moreover, we employ a hierarchical neural network to capture the structural information of documents and finally introduce a neural model which incorporates all three stylistic features including lexical, syntactic and structural.

## Iii Style-aware Neural Model

We introduce a neural network which encodes the stylistic information of documents from three levels of language production (Lexical, syntactic, and structural). We assume that each document is a sequence of sentences and each sentence is a sequence of words, where , and are model hyperparameters and the best values are explored through the hyperparameter tuning phase (Section IV-C). First, we obtain both lexical and syntactic representation of words using lexical and syntactic embeddings respectively. These two representation are fed into two identical hierarchical neural network which encode the lexical and syntactic patterns of documents independently and in parallel. Ultimately, these two representation are aggregated into the final vector representation of document which is fed into a softmax layer to compute the probability distribution over class labels.

The hierarchical neural network is comprised of convolutional layers as word-level encoder to obtain the sentence representations. They are then aggregated into document representation using recurrent neural networks. Finally, we use attention mechanism to reward the sentences which contribute more to the detection of authorial writing style. The overall architecture of the proposed model is shown in figure 2. We elaborate each component in the following subsections.

### Iii-a Lexical and Syntax Encoding

We encode semantic and syntactic information of documents independently using lexical and syntactic embeddings which is illustrated in figure 1. These two representation will fed into two parallel hierarchical networks. Hence the syntactic and semantic patterns of document are learned independently from each other.

#### Iii-A1 Lexical Embedding

In lexical-level, we embed each word to a vector representation. We use pre-trained Glove embeddings [18] and represent each sentence as the sequence of its corresponding word embeddings.

#### Iii-A2 Syntactic Embedding

Given a sentence, we convert each word into the corresponding POS tag in the sentence, and then embed each POS tag into a low dimensional vector using a trainable lookup table , where is the set of all possible POS tags in the language. We use NLTK part-of-speech tagger [4] for the tagging purpose and use the set of POS tags in our model as follows.

T = { CC, CD, DT, EX, FW, IN, JJ, JJR, JJS, LS, MD, NN, NNS, NNP, NNPS, PDT, POS, PRP, PRP$, RB, RBR, RBS, RP, SYM, TO, UH, VB, VBD, VBG, VBN, VBP, VBZ, WDT, WP, WP$, WRB, ‘,’, ‘:’, ‘…’, ‘;’, ‘?’, ‘!’, ‘.’, ‘\$’, ‘(’, ‘)’, “‘ ’, ‘” ’}

One of the advantages of syntax embedding over word embeddings is its low dimensional lookup table compared to the word embeddings, where the size of vocabulary in large datasets usually surpasses 50K words. On the other hand, the size of syntactic embedding lookup table is significantly smaller, fixed, and independent of the dataset which makes the proposed representation less prone to out-of-vocabulary problem.

### Iii-B Hierarchical Model

#### Iii-B1 Word-level Encoder

The outputs of lexical and syntactic embedding layer go into two identical convolutional layers (lexical-CNN and Syntactic-CNN) which learn the semantic and syntactic patterns of sentences in parallel. Due to the identical architecture of both networks, we only elaborate on the syntactic-CNN in what follows.

Let be the vector representation of sentence , and be the convolutional filter with receptive field size of . We apply a single layer of convolving filters with varying window sizes as the rectified linear unit function (relu) with a bias term b, followed by a temporal max-pooling layer which returns only the maximum value of each feature map . Each sentence is then represented by its most important syntactic n-grams, independent of their position in the sentence. Variable receptive field sizes are used to compute vectors for different n-grams in parallel; and they are concatenated into a final feature vector afterwards, where is the total number of filters:

 Crij=relu(WTSj:j+r−1+b),j∈[1,N−r+1],
 ^Cri=max{Cri},
 hi=⊕^Cri,∀r∈Z

#### Iii-B2 Sentence-level Encoder

Sentence encoder learns the lexical/syntactic representation of a document from the sequence of sentence representations output from the word-level encoder. We use a bidirectional LSTM To capture how sentences with different syntactic patterns are structured in a document. The vector output from the sentence encoder is calculated as follows.

 →hdi=LSTM(hsi),i∈[1,M],
 ←hdi=LSTM(hsi),i∈[M,1],
 hdi=[→hdi;←hdi]

Needless to say, not all sentences are equally informative about the authorial style of a document. Therefore, we incorporate attention mechanism to reward the sentences that contribute more in detecting the writing style. We define a sentence level vector and use it to measure the importance of the sentence as follows:

 ui=tanh(Wshdi+bs)
 αi=exp(uTius)∑iexp(uTius)
 V=∑iαihdi

Where is a learnable vector and is randomly initialized during the training process and is the vector representation of document which is weighted sum of vector representations of all sentences.

The primary reason for adopting recurrent architecture for sentence-encoder is because recurrent neural networks have been shown to be essential for capturing the underlying hierarchical structure of sequential data [35]. By adopting this approach sentence-encoder is able to encode how sentences are structured in a document. Accordingly, structural information of documents are incorporated into the final document representation.

### Iii-C Lexical and Syntactic Representations Fusion

In this phase, the semantic and syntactic representations of document learned independently by the two parallel hierarchical neural networks are concatenated into the final vector representation.

 Vk=[Vlexical;Vsyntactic]

### Iii-D Classification

The learned vector representation of documents are fed into a softmax classifier to compute the probability distribution of class labels. Suppose is the final vector representation of document output from the fusion layer. The prediction is the output of softmax layer and is computed as:

 ~yk=softmax(WcVk+bc),

where and are the learnable weight and learnable bias, respectively; and is a dimensional vector, where C is the number of classes. We use cross-entropy loss to measure the discrepancy of predictions and the true labels . The model parameters are optimized to minimize the cross-entropy loss over all the documents in the training corpus. Hence, the regularized loss function over documents denoted by is:

 J(θ)=−1XX∑i=1C∑k=1yiklog~yik+λ||θ||

## Iv Experimental Studies

First, we provide ablation studies to report the contribution of the three stylistic levels (lexical, syntactic, and structural) in the final results. Then we show the performance of our proposed method (Style-HAN) on several benchmark datasets in comparison with the existing baselines in the literature.

### Iv-a Datasets

We evaluate the proposed approach on several benchmark datasets:

• CCAT10 , CCAT50: Newswire stories from Reuters Corpus Volume 1 (RCV1) written by 10 and 50 authors, respectively [32].

• BLOGS10, BLOGS50: Posts written by 10 and 50 top bloggers respectively, originated from data set of 681,288 blog posts by 19,320 bloggers for blogger.com [25].

Some statistics on the sentence length and document length for each dataset are presented in Table I.

### Iv-B Baselines

We compare our method with various baseline approaches which represent the current state of the art in authorship attribution problem, including SVM with affix and punctuation 3-grams [23], CNN-char [22], Continuous N-gram representation [24], N-gram CNN [29], and syntax-CNN [40]. Their results reported in this paper are obtained from the corresponding papers.

### Iv-C Hyperparameter Tuning

The model hyperparameters include the number of sentences per document() and the number of words per sentence(), with their best values obtained during the tuning phase. Table II shows the corresponding values for each dataset. The networks are trained using mini-batches with size of 32. We use Nadam optimizer [34] to optimize the cross entropy loss over 50 epochs of training. We use 100 dimensional pre-trained Glove embeddings [18] for the lexical layer and 100 dimensional randomly initialized embeddings for the syntactic layer. In order to reduce the effect of out-of-vocabulary problem in lexical layer, we retain only 50,000 most frequent words. All the performance metrics are the mean of accuracy (on the test set) calculated over 10 runs with a 0.9/0.1 train/validation split.

### Iv-D Performance Results

#### Iv-D1 Syntactic Representation

First, we compare our proposed syntax encoding method (POS encoding) to the prior method syntax tree (ST) encoding [40]. In ST encoding, the syntax parse tree of sentences are utilized to encode the syntactic information of sentences. Each word in the sentence is embedded through the corresponding path in the syntax tree. In this approach, the hierarchical structure of sentences are explicitly given as an input to the model. However, we argue if such an explicit annotation is necessary for author attribution. In our proposed POS encoding model, each word is embedded by only its part of speech tag and the neural model itself implicitly learns the dependencies between the parts of speech in the sentences. Furthermore, utilizing only POS tags of words makes the model computationally less expensive when compared to utilizing syntax parse tree structure.

Table III reports the accuracy of different syntactic representations for all the benchmark datasets. In ST encoding, the authors uses a CNN-based neural model; hence, we employ the the identical network architecture proposed in the paper in order to have a fair comparison of two different syntactic representations. The results for ST encoding are reported from the corresponding paper. The experimental results demonstrate that our proposed syntactic representation (POS-CNN) outperforms the previously proposed method (ST-CNN) by a large margin in all the benchmark datasets (38.6% in CCAT10, 30.80% in CCAT50, 19.62% in BLOGS10, 11.94% in BLOGS50). This improvement in performance can be due two factors. First, the model complexity in POS encoding has been remarkably decreased which makes it more capable of generalization. Second, utilizing syntax tree imposes the positional factor of syntactic units in the sentences. While authorship attribution task is interested to capture the frequent syntactic patterns regardless of their position in the sentences. Our performance results confirm this insight showing that low-level syntax information are more revealing of writing style when compared to hierarchical notion of syntax.

#### Iv-D2 Hierarchical Neural Model

We have employed hierarchical attention network (HAN) in order to capture the structural information of documents. In order to understand the contribution of our network architecture to the performance, we compare our network architecture (POS-HAN) to the previously proposed CNN-based model (POS-CNN) when the syntactic representations are kept identical. According to Table III, POS-HAN outperforms POS-CNN model consistently across all the benchmark datasets (1.74% in CCAT10, 0.32% in CCAT50, 1.06% in BLOGS10, 2.91% in BLOGS50). This observation indicates that hierarchical neural models which capture the hierarchical structure of documents are a better choice for authorship attribution task. This confirms our argument that structural information of the document is important to reveal the authorial writing style.

#### Iv-D3 Lexical and Syntactic Model

In order to understand the contribution of lexical and syntactic models to the final predictions, we performed an ablation study. The results are reported in table IV. In Syntactic-HAN, only syntactic representation of documents () is fed into the softmax layer to compute the final predictions. Similarly, in Lexical-HAN, only lexical representation of documents () is fed into the softmax classifier. The final stylometry model, Style-HAN, fuses both representations and computes the class labels using a softmax classifier (Section III-C). According to the table, lexical model consistently outperforms the syntactic-model across all the benchmark datasets. Moreover, combining the two representations further improves the performance results.

Figure 3 illustrates the training loss of the syntax, lexical, and style encoding over 50 epochs of training for all the datasets. As we observe, the lexical model maintains lower loss in the earlier epochs and converges faster when compared to the syntactic model. However, combining them into the style model reduces the loss and improves the performance.

Based on the observation from figure 3 and table IV, we realize that even if Syntactic-HAN achieves a comparable performance results combining it with Lexical-HAN, slightly improves the overall performance (Style-HAN). This can be due to the fact that lexical-based recurrent neural networks alone are able to encode significant amount of syntax even in the absence of explicit syntactic annotations [5]. However, explicit syntactic annotation further improves the performance results when it’s compared to lexical-based model. As shown in Table IV, the performance improvement in terms of accuracy is consistent across all the benchmark datasets (4.54% in CCAT10, 2.85% in CCAT50, 2.02% in BLOGS10, 1.42% in BLOGS50)

#### Iv-D4 Training Syntactic and Lexical Networks

We examine two different approaches (combined and parallel) for fusing lexical and syntactic encoding into the final style network. In the combined approach, we concatenate the syntactic and lexical embeddings and construct a unified representation for each word which contains both lexical and syntactic information. Subsequently, this representation is fed to a hierarchical attention network to learn the final document representation. In the parallel approach, the lexical and syntactic embeddings are fed into two identical hierarchical neural networks and the syntactic and lexical representations of documents which are learned independently and in parallel are concatenated into a final document representation. Figure 4 illustrates these two approaches. Table V reports the accuracy of the combined and the parallel fusion approaches. According to these results, training two parallel networks for lexical and syntax encoding achieves higher accuracy when compared to training the same network with combined embeddings. This observation can be due to the fact that syntactic and lexical models contain almost complementary information which are language structure and semantics, respectively and training them independently delivers better results.

#### Iv-D5 Style Encoding

We compare our proposed style-aware neural model (Style-HAN) with the other stylometric models in the literature. Table VI reports the accuracy of the models on the four benchmark datasets. All the results are obtained from the corresponding papers, with the dataset configuration kept identical for the sake of fair comparison. The best performance result for each dataset is highlighted in bold. It shows that Style-HAN outperforms the baselines by 2.38%, 1.35%, 8.73%, and 4.46% over the CCAT10, CCAT50, BLOGs10, and BLOGS50 datasets, respectively. This indicates the effectiveness of encoding document information in three stylistic levels including lexical, syntactic and structural.

#### Iv-D6 Sensitivity to Sentence Length

We examine our model’s sensitivity to the sentence length (M). We evaluate the performance of the model on different sentence lengths of 10,20,30,and 40 words while the sequence length (N) is kept constant. Figure 5 shows the performance results of the Style-HAN on the four datasets. It shows that the model achieves the best performance in CCAT10 and CCAT50 when the sentence length is equal to 30; while in BLOGS10 and BLOGS50, the highest performance is observed when the sentence length is equal to 20. Table I shows the average sentence lengths in all samples in CCAT10, CCAT50, BLOGS10, and BLOGS50 are 27, 26, 18, and 17, respectively. Accordingly, the models perform better when the sentence length is close to the average sentence length in the dataset. This is simply because a shorter sentence length results in information loss and on the other hand a higher sentence length leads to capturing misleading features. Both situations result in lower accuracy.

#### Iv-D7 Sensitivity to Document Length

We examine the model performance across different number of sentences per document(document length). Figure 6 illustrates the accuracy of the model when the number of sentences per document is assumed to be 10, 20, 30, and 40, respectively. We observe that increasing the sequence length (the number of sentences in document) generally boosts the performance on all the datasets. This observation confirms the fact that investigation of writing style in short documents is more challenging [15].

## V Conclusion

In this paper we introduce a style-aware neural model which encodes document information from three stylistic levels including lexical, syntactic, and structural in order to better capture the authorial writing style. First, we propose an efficient way to encode the syntactic patterns of sentences using only their corresponding part-of-speech tags. Lexical and syntactic embeddings of words are then used to create two different sentence representations. Subsequently, a hierarchical neural network is employed to capture the structural patterns of sentences in the document, which takes both syntactic and lexical information as input. Finally, these syntactic and lexical representation of documents are concatenated in the fusion step to build the final document representation. Our experimental results on the benchmark datasets in authorship attribution tasks confirm the benefits of encoding document information from all three stylistic levels, and show the performance advantages of our techniques.

## References

• [1] S. Afroz, M. Brennan, and R. Greenstadt (2012) Detecting hoaxes, frauds, and deception in writing style online. In Security and Privacy (SP), 2012 IEEE Symposium on, pp. 461–475. Cited by: §II.
• [2] S. Argamon-Engelson, M. Koppel, and G. Avneri (1998) Style-based text categorization: what newspaper am i reading. In Proc. of the AAAI Workshop on Text Categorization, pp. 1–4. Cited by: §II.
• [3] D. Bagnall (2016) Authorship clustering using multi-headed recurrent neural networks. arXiv preprint arXiv:1608.04485. Cited by: §II.
• [4] S. Bird, E. Klein, and E. Loper (2009) Natural language processing with python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”. Cited by: §III-A2.
• [5] T. Blevins, O. Levy, and L. Zettlemoyer (2018) Deep rnns encode soft hierarchical syntax. arXiv preprint arXiv:1805.04218. Cited by: §IV-D3.
• [6] M. Brennan, S. Afroz, and R. Greenstadt (2012) Adversarial stylometry: circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security (TISSEC) 15 (3), pp. 12. Cited by: §II.
• [7] E. Castillo, D. Vilarino, O. Cervantes, and D. Pinto (2015) Author attribution using a graph based representation. In 2015 International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 135–142. Cited by: §I.
• [8] E. Ferracane, S. Wang, and R. Mooney (2017) Leveraging discourse information effectively for authorship attribution. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1, pp. 584–593. Cited by: §I.
• [9] Z. Ge, Y. Sun, and M. J. Smith (2016) Authorship attribution using a neural network language model.. In AAAI, pp. 4212–4213. Cited by: §II.
• [10] J. Hitschler, E. van den Berg, and I. Rehbein (2017) Authorship attribution with convolutional neural networks and pos-eliding. In Proceedings of the Workshop on Stylistic Variation, pp. 53–58. Cited by: §I, §II.
• [11] J. Kabbara and J. C. K. Cheung (2016) Stylistic transfer in natural language generation systems using recurrent neural networks. In Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods, pp. 43–47. Cited by: §II.
• [12] M. Koppel, J. Schler, and S. Argamon (2009) Computational methods in authorship attribution. Journal of the American Society for information Science and Technology 60 (1), pp. 9–26. Cited by: §I.
• [13] M. Krause (2014) A behavioral biometrics based authentication method for mooc’s that is robust against imitation attempts. In Proceedings of the first ACM conference on Learning@ scale conference, pp. 201–202. Cited by: §II.
• [14] T. Kreutz and W. Daelemans (2018) Exploring classifier combinations for language variety identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pp. 191–198. Cited by: §II.
• [15] T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, and D. Woodard (2017) Surveying stylometry techniques and applications. ACM Computing Surveys (CSUR) 50 (6), pp. 86. Cited by: §I, §IV-D7.
• [16] M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards (2003) Lying words: predicting deception from linguistic styles. Personality and social psychology bulletin 29 (5), pp. 665–675. Cited by: §II.
• [17] J. W. Pennebaker and L. A. King (1999) Linguistic styles: language use as an individual difference.. Journal of personality and social psychology 77 (6), pp. 1296. Cited by: §II.
• [18] J. Pennington, R. Socher, and C. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §III-A1, §IV-C.
• [19] S. R. Pillay and T. Solorio (2010) Authorship attribution of web forum posts. In 2010 eCrime Researchers Summit, pp. 1–7. Cited by: §I.
• [20] J. Posadas-Durán, I. Markov, H. Gómez-Adorno, G. Sidorov, I. Batyrshin, A. Gelbukh, and O. Pichardo-Lagunas (2015) Syntactic n-grams as features for the author profiling task. Working Notes Papers of the CLEF. Cited by: §II.
• [21] S. Raghavan, A. Kovashka, and R. Mooney (2010) Authorship attribution using probabilistic context-free grammars. In Proceedings of the ACL 2010 Conference Short Papers, pp. 38–42. Cited by: §II.
• [22] S. Ruder, P. Ghaffari, and J. G. Breslin (2016) Character-level and multi-channel convolutional neural networks for large-scale authorship attribution. arXiv preprint arXiv:1609.06686. Cited by: §IV-B.
• [23] U. Sapkota, S. Bethard, M. Montes, and T. Solorio (2015) Not all character n-grams are created equal: a study in authorship attribution. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies, pp. 93–102. Cited by: §IV-B.
• [24] Y. Sari, A. Vlachos, and M. Stevenson (2017) Continuous n-gram representations for authorship attribution. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 267–273. Cited by: §IV-B.
• [25] J. Schler, M. Koppel, S. Argamon, and J. W. Pennebaker (2006) Effects of age and gender on blogging.. In AAAI spring symposium: Computational approaches to analyzing weblogs, Vol. 6, pp. 199–205. Cited by: 2nd item.
• [26] R. Schwartz, M. Sap, I. Konstas, L. Zilles, Y. Choi, and N. A. Smith (2017) The effect of different writing tasks on linguistic style: a case study of the roc story cloze task. arXiv preprint arXiv:1702.01841. Cited by: §II.
• [27] S. Segarra, M. Eisen, and A. Ribeiro (2015) Authorship attribution through function word adjacency networks. IEEE Transactions on Signal Processing 63 (20), pp. 5464–5478. Cited by: §I.
• [28] Y. Seroussi, I. Zukerman, and F. Bohnert (2011) Authorship attribution with latent dirichlet allocation. In Proceedings of the fifteenth conference on computational natural language learning, pp. 181–189. Cited by: §I.
• [29] P. Shrestha, S. Sierra, F. Gonzalez, M. Montes, P. Rosso, and T. Solorio (2017) Convolutional neural networks for authorship attribution of short texts. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 669–674. Cited by: §I, §II, §IV-B.
• [30] J. Soler and L. Wanner (2017) On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Vol. 2, pp. 681–687. Cited by: §II.
• [31] E. Stamatatos and M. Koppel (2011) Plagiarism and authorship analysis: introduction to the special issue. Language Resources and Evaluation 45 (1), pp. 1–4. Cited by: §I.
• [32] E. Stamatatos (2008) Author identification: using text sampling to handle the class imbalance problem. Information Processing & Management 44 (2), pp. 790–799. Cited by: 1st item.
• [33] K. Sundararajan and D. Woodard (2018) What represents” style” in authorship attribution?. In Proceedings of the 27th International Conference on Computational Linguistics, pp. 2814–2822. Cited by: §II.
• [34] I. Sutskever, J. Martens, G. Dahl, and G. Hinton (2013) On the importance of initialization and momentum in deep learning. In International conference on machine learning, pp. 1139–1147. Cited by: §IV-C.
• [35] K. Tran, A. Bisazza, and C. Monz (2018) The importance of being recurrent for modeling hierarchical structure. arXiv preprint arXiv:1803.03585. Cited by: §I, §III-B2.
• [36] C. van der Lee and A. van den Bosch (2017) Exploring lexical and syntactic features for language variety identification. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pp. 190–199. Cited by: §II.
• [37] P. Varela, E. Justino, and L. S. Oliveira (2011) Selecting syntactic attributes for authorship attribution. In The 2011 International Joint Conference on Neural Networks, pp. 167–172. Cited by: §I.
• [38] W. Y. Wang (2017) ” Liar, liar pants on fire”: a new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648. Cited by: §II.
• [39] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489. Cited by: §I.
• [40] R. Zhang, Z. Hu, H. Guo, and Y. Mao (2018) Syntax encoding with application in authorship attribution. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2742–2753. Cited by: §I, §II, §IV-B, §IV-D1.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters