Contrastive Language Adaptation for Cross-Lingual Stance Detection

Contrastive Language Adaptation for Cross-Lingual Stance Detection

Mitra Mohtarami, James Glass, Preslav Nakov
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
Qatar Computing Research Institute, HBKU, Doha, Qatar

We study cross-lingual stance detection, which aims to leverage labeled data in one language to identify the relative perspective (or stance) of a given document with respect to a claim in a different target language. In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language. The evaluation results on public benchmark datasets and comparison against current state-of-the-art approaches demonstrate the effectiveness of our approach.

1 Introduction

The rise of social media has enabled the phenomenon of “fake news,” which could target specific individuals and can be used for deceptive purposes Lazer et al. (2018); Vosoughi et al. (2018). As manual fact-checking is a time-consuming and tedious process, computational approaches have been proposed as a possible alternative Popat et al. (2017); Wang (2017); Mihaylova et al. (2018), based on information sources such as social media Ma et al. (2017), Wikipedia Thorne et al. (2018), and knowledge bases Huynh and Papotti (2018). Fact-checking is a multi-step process Vlachos and Riedel (2014): (i) checking the reliability of media sources, (ii) retrieving potentially relevant documents from reliable sources as evidence for each target claim, (iii) predicting the stance of each document with respect to the target claim, and finally (iv) making a decision based on the stances from (iii) for all documents from (ii).

Here, we focus on stance detection which aims to identify the relative perspective of a document with respect to a claim, typically modeled using labels such as agree, disagree, discuss, and unrelated.

Current approaches to stance detection Bar-Haim et al. (2017); Dungs et al. (2018); Kochkina et al. (2018); Inkpen et al. (2017); Mohtarami et al. (2018) are well-studied in mono-lingual settings, in particular for English, but less attention has been paid to other languages and cross-lingual settings. This is partially due to domain differences and to the lack of training data in other languages.

We aim to bridge this gap by proposing a cross-lingual model for stance detection. Our model leverages resources of a source language (e.g., English) to train a model for a target language (e.g., Arabic). Furthermore, we propose a novel contrastive language adaptation approach that effectively aligns samples with similar or dissimilar stances across source and target languages using task-specific loss functions. We apply our language adaptation approach to memory networks Sukhbaatar et al. (2015), which have been found effective for mono-lingual stance detection Mohtarami et al. (2018).

Our model can explain its predictions about stances of documents against claims in a different/target language by extracting relevant text snippets from the documents of the target language as evidence. We use evidence extraction as a measure to evaluate the trasferability of our model. This is because more accurate evidence extraction indicates that the model can better learn semantic relations between claims and pieces of evidence, and consequently can better transfer knowledge to the target language.

The contributions of this paper are summarized as follows:

  • [itemsep=1pt,topsep=1pt]

  • We propose a novel language adaptation approach based on contrastive stance alignment that aligns the class labels between source and target languages for effective cross-lingual stance detection.

  • Our model is able to extract accurate text snippets as evidence to explain its predictions in the target language (results are in Section 4.2).

  • To the best of our knowledge, this is the first work on cross-lingual stance detection.

We conducted our experiments on English (as source language) and Arabic (as target language). In particular, we used the Fake News Challenge dataset Hanselowski et al. (2018) as source data and an Arabic benchmark dataset Baly et al. (2018) as target data. The evaluation results have shown and absolute improvement in terms of macro-F1 and weighted accuracy for stance detection over the current state-of-the-art mono-lingual baseline, and , , , , and points of absolute improvement in terms of precision at ranks for extracting evidence snippets respectively. Furthermore, a key finding in our investigation is that, in contrast to other tasks Devlin et al. (2018); Peters et al. (2018), pre-training with large amounts of source data is less effective for cross-lingual stance detection. We show that this is because pre-training can considerably bias the model toward the source language.

Figure 1: The architecture of our cross-lingual memory network for stance detection.

2 Method

Assume that we are given a training dataset for a source language, , which contains a set of triplets as follows: , where is the number of source samples, is a pair of claim and document , and , , is the corresponding label indicating the stance of the document with respect to the claim. In addition, we are given a very small training dataset for the target language , where () is the number of target samples, and is a pair of claim and document in the target language with stance label . In reality, (i)  the size of the target dataset is very small, (ii)  claims and documents in the source and target languages are from different domains, and (iii)  the only commonality between the source and target datasets is in their stance labels, i.e., .

We develop a language adaptation approach to effectively use the commonality between the source and the target datasets in their label space and to deal with the limited size of the target training data. We apply our language adaptation approach to end-to-end memory networks Sukhbaatar et al. (2015) for cross-lingual stance detection.

We use memory networks as they have achieved state-of-the-art performance for mono-lingual stance detection Mohtarami et al. (2018). However, our language adaptation approach can be applied to any other type of neural network. The architecture of our cross-lingual stance detection model is shown in Figure 1. It has two main components: (iMemory Networks indicated with two dashed boxes for the source and the target languages, and (iiContrastive Language Adaptation component. In what follows, we first explain our memory network model for cross-lingual stance detection (Section 2.1) and then present our contrastive language adaptation approach (Section 2.2).

2.1 Memory Networks

Memory networks are designed to remember past information Sukhbaatar et al. (2015) and have been successfully applied to NLP tasks ranging from dialog Bordes and Weston (2016) to question answering Xiong et al. (2016) and mono-lingual stance detection Mohtarami et al. (2018). They include components that can potentially use different learning models and inference strategies. Our source and target memory networks follow the same architecture as depicted in Figure 1:

A memory network consists of six components. The network takes as input a document and a claim and encodes them into the input space . These representations are stored in the memory component for future processing. The relevant parts of the input are identified in the inference component , and used by the generalization component to update the memory . Finally, the output component generates an output from the updated memory, and encodes it to a desired format in the response component using a prediction function, e.g., softmax for classification tasks. We elaborate on these components below.

Input representation component :

It encodes documents and claims into corresponding representations. Each document is divided into a sequence of paragraphs , where each is encoded as using an LSTM network, and as using a CNN; these representations are stored in the memory component . Note that while LSTMs are designed to capture and memorize their inputs Tan et al. (2016), CNNs emphasize the local interaction between individual words in sequences, which is important for obtaining good representation Kim (2014).

Thus, our component uses both LSTM and CNN representations. It also uses separate LSTM and CNN with their own parameters to represent each input claim as and , respectively.

We consider each paragraph as a single piece of evidence because a paragraph usually represents a coherent argument, unified under one or more inter-related topics. We thus use the terms paragraph and evidence interchangeably.

Inference component :

Our inference component computes LSTM- and CNN-based similarity between each claim and evidence as follows:

where and indicate claim-evidence similarity based on LSTM and CNN respectively, and are LSTM representations of and respectively, and are the corresponding CNN representations, and and are similarity matrices trained to map claims and paragraphs into the same space with respect to their LSTM and CNN representations. The rationale behind using these similarity matrices is that, in the memory network, we seek a transformation of the input claim, i.e., , in order to obtain the closest evidence to the claim.

Additionally, we compute another semantic similarity vector, , by applying a cosine similarity between the TF.IDF Spärck Jones (2004) representation of and . This is particularly useful for stance detection as it can help filtering out unrelated pieces of evidence.

Memory and Generalization components:

Our memory component stores representations and the generalization component improves their quality by filtering out unrelated evidence. For example, the LSTM representations of paragraphs, , are updated using the claim-evidence similarity as follows: . This transformation will help filter out unrelated evidence with respect to claims. The updated in conjunction with are used by the inference component to compute as we explained above. Then, are in turn used to update CNN representations in memory as follows: . Finally, the updated and are used to compute .

Output representation component :

This component computes the output of the memory by concatenating the average vector of the updated with the maximum and average of claim-evidence similarity vectors , and . The maximum helps to identify parts of documents that are most similar to claims, while the average estimates the overall document-claim similarity.

Response generation component :

This component computes the final stance of a document with respect to a claim. For this, the output of component is concatenated with and and fed into a softmax to predict the stance of the document with respect to the claim.

All the memory network parameters, including those of CNN and LSTM in the component, the similarity matrices and in , and the classifier parameters in , are jointly learned during the training process with our language adaptation.

2.2 Contrastive Language Adaptation

Figure 2: Illustration of stance equal alignment (SEA), stance separation alignment (SSA), and classification alignment (CA) constraints. Different shapes indicate different stance labels and colors specify source (blue) and target (green) languages.
Algorithm 1. Cross-Lingual Stance Detection Model
   : set of pairs of documents and claims in the source language, where ,
   : small set of pairs of documents and claims in the target language with
   : assign stance labels to given unlabeled target pairs
Cross-Lingual model:
Create the sets and , where if the source and the target have the same label, and otherwise.
Loop for epochs:
   pass to the source memory network to
   create its representation .
   pass to the target memory network to
   create its representation .
   pass to the classification to compute its
   classification loss .
   pass to the language adaptation to
   compute the stance alignment loss .
   compute total loss .
repeat steps - with a change in step by passing the target sample to the classification instead of the source sample, and compute its in step .
jointly optimize all parameters of the model using the average loss .
Table 1: Cross-lingual stance detection model.

Memory networks are effective for stance detection in mono-lingual settings Mohtarami et al. (2018) when there is sufficient training data. However, we show that these models have limited transferability to target languages with limited data. This could be due to discrepancy between the underlying data distributions in the source and target languages. We show that the performance of these networks can be trivially increased when the model, pre-trained on source data, is fine-tuned using small amounts of labeled target data. We further develop contrastive language adaptation that can exploit the labeled source data to perform well on target data.

Our contrastive adaptation approach:

  • [noitemsep,topsep=0pt]

  • encourages pairs from the source language and from the target language with the same stance labels (i.e., ) to be nearby in the embedding space. We call this mapping Stance Equal Alignment (SEA), illustrated with dotted lines in Figure 2. Note that documents and claims in the two languages are often semantically different and are not corresponding translations of each other.

  • encourages pairs from the source language and from the target language with different stance labels (i.e., ) to be far apart in the embedding space. We call this mapping Stance Separation Alignment (SSA), shown with dashed lines in Figure 2.

  • encourages pairs from the source language and from the target language to be correctly classified as and . We call this Classification Alignment (CA), solid lines in Figure 2.

We make complete use of the stance labels in the cross-lingual setting by parameterizing our model according to the distance between the source and the target samples in the embedding space.

For stance equal alignment (SEA) constraint, the objective is to minimize the distance between pairs of source and target data with the same stance labels. We achieve this using the following loss:


where g maps its input pair to an embedding space using our memory network or any mono-lingual model, and D computes Euclidean distance.

For stance separation alignment (SSA), the goal is to maximize the distance between pairs with different stance labels. We use the following loss:


where we maximize the distance between pairs with different stance labels up to the margin .

The margin parameter specifies the extent of separability in the embedding space.

We can further use any classification loss to enforce classification alignment (CA). We use categorical cross-entropy and call it Classification Alignment loss .

We develop our overall language adaptation loss, named Contrastive Stance Alignment loss, , by combining and as follows:


Finally, the total loss of our cross-lingual stance detection model is defined as follows:


where the parameter controls the balance between classification and language adaptation losses, which we optimize on the validation dataset.

Information Flow:

Our overall cross-lingual model for stance detection is shown in Figure 1, and a summary of the algorithm is presented in Table 1. As Figure 1 shows, each source and target pairs are passed to the source and to the target memory networks to obtain their corresponding representations (Lines - in Table 1). The source representation and its gold stance label are passed to the classifier to compute the classification loss (Line ). In addition, the source and the target representations in conjunction with a binary parameter (, which is if the source and the target have the same stance label, and otherwise) are passed to the language adaptation component to compute the contrastive stance alignment loss (Line ). Finally, the total loss is computed based on Equation (4) (Line ).

The classifier also uses labeled target samples to create a shared embedding space and to fine-tune itself with respect to the target language. For this purpose, we repeat the above steps by switching the target and the source pipelines (Line ). Finally, we compute the average of all losses and we use it to optimize the parameters of our model (Line ).

Pre-training for Language Adaptation:

Pre-training has been found effective in many language adaptation settings Tzeng et al. (2017). To investigate the effect of pre-training, we first pre-train the source memory network and the classifier using (only the top pipeline in Figure 1), and then we apply language adaptation with the full model.

Methods Weigh. Acc. Acc. Macro-F1 F1 (agree, disagree, discuss, unrelated)
1.    All-unrelated 34.8 68.1 20.3 0 / 0 / 0 / 81.0
2.    All-agree 40.2 15.6 6.7 27.0 / 0 / 0 / 0
\hdashline[1.5pt/2pt] 3.    Gradient Boosting Baly et al. (2018) 55.6 72.4 41.0 60.4 / 9.0 / 10.4 / 84.0
4.    TFMLP Riedel et al. (2017) 49.3 66.0 37.1 47.0 / 7.8 / 13.4 / 80.0
5.    EnrichedMLP Hanselowski et al. (2018) 55.1 70.5 41.3 59.1 / 9.2 / 14.1 / 82.3
6.    Ensemble  Baird et al. (2017) 53.6 71.6 37.2 57.5 / 2.1 / 6.2 / 83.2
7.    MN  Mohtarami et al. (2018) 55.3 70.9 41.7 60.0 / 15.0 / 08.5 / 83.1
8.    MN 53.2 64.2 36.0 40.4 / 02.0 / 19.1 / 82.4
9.    MN 57.3 65.0 42.5 58.0 / 12.2 / 20.9 / 79.0
10.  ADMN (adversarial) 58.6 58.4 43.4 60.2 / 16.1 / 24.2 / 72.9
11.  CLMN (contrastive) 61.3 71.6 45.2 65.1 / 11.6 / 20.5 / 83.7
Table 2: Evaluation results on the target Arabic test dataset.

3 Experiments

Data and Settings.

As source data, we use the Fake News Challenge dataset111Available at which contains K claim-document pairs in English with {agree, disagree, discuss, unrelated} as stance labels. As target dataset, we use K Arabic claim-document pairs developed in Baly et al. (2018).222Available at

We perform -fold cross-validation on the Arabic dataset, using each fold in turn for testing, and keeping % of the remaining data for training and % for development. We use -dimensional pretrained cross-lingual Wikipedia word embeddings from MUSE Conneau et al. (2017). We use -dimensional units for the LSTM and feature maps with filter width of for the CNN. We consider the first paragraphs per document, which is the median number of paragraphs in source documents. We optimize all hyper-parameters on validation data using Adam Kingma and Ba (2014).

Evaluation Measures.

We use the followings:

  • [noitemsep,topsep=0pt]

  • Accuracy: The fraction of correctly classified examples. For multi-class classification, accuracy is equivalent to micro-average F Manning et al. (2008).

  • Macro-F: The average of F scores that were calculated for each class separately.

  • Weighted Accuracy: This is a hierarchical metric, which first awards points if the model correctly predicts a document-claim pair as related333related = {agree, disagree, discuss} or unrelated. If it is related, additional points are assigned if the model correctly predicts the pair as agree, disagree, or discuss. The goal of this weighting schema is to balance out the large number of unrelated examples Hanselowski et al. (2018).


We consider the following baselines:

  • [noitemsep,topsep=0pt]

  • Heuristic: Given the imbalanced nature of our data, we use two heuristic baselines where all test examples are labeled as unrelated or agree. The former is a majority class baseline favoring accuracy and macro-F, while the latter is better for weighted accuracy.

  • Gradient Boosting Baly et al. (2018): This is a Gradient Boosting classifier with -gram features as well as indicators for refutation and polarity.

  • TFMLP Riedel et al. (2017): This is an MLP with normalized bag-of-words features enriched with a single TF.IDF-based similarity feature for each claim-document pair.

  • EnrichedMLP Hanselowski et al. (2018): This model combines five MLPs, each with six hidden layers and advanced features from topic models, latent semantic analysis, etc.

  • Ensemble Baird et al. (2017): It is an ensemble based on weighted average of a deep convolutional neural network and a gradient-boosted decision tree - the best model at the Fake News Challenge.

  • Mono-lingual Memory Network Mohtarami et al. (2018): This model is the current state-of-the-art for stance detection on our source dataset. It is an end-to-end memory network which incorporates both CNN and LSTM for prediction.

  • Adversarial Memory Network: We use adversarial domain adaptation Ganin et al. (2016) instead of contrastive language adaptation in our cross-lingual memory network.


Table 2 shows the performance of all models on the target Arabic test set. The All-unrelated and All-agree baselines perform poorly across evaluation measures; All-unrelated performs better than All-agree because unrelated is the dominant class (% of examples).

Rows show that Gradient Boosting and EnrichedMLP yield similar results, while TFMLP performs the worst. We attribute this to the advanced features used in the two former models. Gradient Boosting has better accuracy due to its better performance on the dominant class. Note that Ensemble performs poorly because of the limited labeled data, which is insufficient to train a good CNN model.

Rows show the results for the mono-lingual memory network (MN) from Mohtarami et al. (2018). The performance of this model when trained on Arabic data only (row ) is comparable to previous baselines (rows ). But, it shows poor performance if trained on source English data and tested on Arabic test data (row ). The model performs best (in terms of weighted accuracy and F1) if first pretrained on source data and then fine-tuned on target training data (row 9).

Row in Table 2 shows the results for adversarial memory network (ADMN). It improves the performance of mono-lingual MN on weighted accuracy and F1, but its accuracy significantly drops. This is because adversarial approaches give higher weights to samples of the majority class (i.e., unrelated) which makes classification more challenging for the discriminator Montahaei et al. (2018).

Row shows the results for our cross-lingual memory network (CLMN) with (); controls the balance between classification and language adaptation losses (tuned using validation data). CLMN outperforms other baselines in terms of weighted accuracy and F1 while showing comparable accuracy. We show that the improvement is due to language adaptation being able to effectively transfer knowledge from the source to target language (see Section 4.2).

The last column in Table 2 shows that unrelated examples are the easiest ones. Also, although the agree and the discuss classes have roughly the same size, i.e., and examples, respectively, the results for agree are notably higher. This is mainly because the documents that discuss a claim often share the same topic with the claim, but they do not take a stance. In addition, the disagree examples are the most difficult ones; this class is by far the smallest one, with only examples.

Figure 3: Impact of pretraining on the macro-F1 across values. The Y axes show average results on the validation datasets with -fold cross-validation.
Methods Weigh. Acc. Acc. Macro-F1
CLMN (with pretraining) 60.2 69.8 43.2
CLMN (without pretraining) 61.3 71.6 45.2
Table 3: CLMN results on the target test dataset.

4 Discussion

4.1 Effect of Pretraining

Table 3 shows CLMN without pretraining () performs better on target test data than CLMN with pretraining (), recall that controls the balance between classification and language adaptation losses. Our further analysis shows that pretraining biases the model toward the source language. Figure 3 shows the impact of pretraining on macro-average F1 score for CLMN across different values of on validation data. While the model without pretraining achieves its best performance with a large (), the model with pretraining performs well with a smaller (). This suggests that our model can capture the characteristics of the source dataset via pretraining when using small supervision from language adaptation (i.e., small ). However, pretraining introduces bias to the source space and the performance drops when larger weights are given to language adaptation; see the results with pretraining in Figure 3.

4.2 Assessment of Model Transferability

The improvements of CLMN model over the monolingual MN models that use the target only, the source only, or both the target and the source (rows in Table 2 respectively) indicate its transferability. We further estimate transferability by measuring the accuracy of the models in extracting evidence that support their predictions. A more accurate model should better transfer knowledge to the target language by accurately learning the relations between claims and pieces of evidence.

Our target data has annotations (in terms of binary labels) for each piece of evidence (here paragraph) that indicate whether it is a rationale for the agree or for the disagree class. Moreover, our inference component () has a claim-evidence similarity vector, , which can be used to rank pieces of evidence from the target document against the target claim.

We use the gold data and the rankings produced by our model in order to measure its precision in extracting evidence that supports its predictions. Figure 4 shows that our CLMN model achieves precision of , , , , and at ranks respectively, and outperforms mono-lingual MN models. This indicates that CLMN can better generalize and transfer knowledge to the target language through learning relations between pieces of evidence and claims.

Figure 4: Transferability of our cross-lingual model.

4.3 Effect of Language Adaptation

Figures 4(a) and 4(b) show the classification () versus contrastive stance () losses obtained from our best language adaptation model (i.e., without pretraining) across training epochs and values. The results are averaged on validation data when performing -fold cross-validation. As Figure 4(a) shows, there is greater reduction in the classification loss for smaller values of , i.e., when classification loss contributes more to the overall loss; see Equation (4). On the other hand, Figure 4(b) shows that the CSA loss decreases with larger values of as the model pays more attention to the CSA loss; see the red and green lines in Figure 4(b). These results indicate that our language adaptation model can find a good balance between the classification loss and the CSA loss, with the value of yielding the best performance.

(a) Classification loss
(b) CSA loss
Figure 5: Classification vs. contrastive stance alignment (CSA) losses across training epochs and values.
(a) without pretraining
(b) with pretraining
Figure 6: Classification loss vs. contrastive stance alignment loss (CSA) vs. total loss during training.

Figure 6 compares the classification , contrastive stance , and total losses obtained by our CLMN model on the validation dataset across training epochs when the loss weight parameter () is set to its best value. Figures 5(a) and  5(b) show the results without and with pretraining for and respectively. Without pretraining (Figure 5(a)), the classification (light-blue line) and CSA (dark-blue line) losses both decrease up to epoch , after which the classification loss keeps decreasing, but the CSA loss starts increasing. With pretraining (Figure 5(b)), the CSA loss rapidly decreases for the first epochs (even though it has a small effect as ), and then continues with a smooth trend. This is because, during the initial training epochs, the model is biased to the source embedding space due to pretraining, and therefore the source and the target examples are far from each other. Then, our language adaptation model aligns the source and the target examples to form a much better shared embedding space and this alignment strategy yields a rapid decrease of the CSA loss in the first few epochs. Yet, in contrast to the CSA loss, the classification loss increases in the first few epochs. This is because the model enforces alignment between the source and the target samples due to the large distances. Finally, the total loss (orange line) indicates a good balance between the classification and the language adaptation losses, and it consistently decreases during training.

5 Related Work

Domain Adaptation.

Previous work has presented several domain adaptation techniques. Unsupervised domain adaptation approaches Ganin and Lempitsky (2015); Long et al. (2016); Muandet et al. (2013); Gong et al. (2012) attempt to align the distribution of features in the embedding space mapped from the source and the target domains. A limitation of such approaches is that, even with perfect alignment, there is no guarantee that the same-label examples from different domains would map nearby in the embedding space. Supervised domain adaptation Daumé,III and Marcu (2006); Becker et al. (2013); Bergamo and Torresani (2010) attempts to encourage same-label examples from different domains to map nearby in the embedding space. While supervised approaches perform better than unsupervised ones, recent work Motiian et al. (2017) has demonstrated superior performance by additionally encouraging class separation, meaning that examples from different domains and with different labels should be projected as far apart as possible in the embedding space. Here, we combined both types of alignments for stance detection.

Stance Detection.

Mohammad et al. (2016) and Zarrella and Marsh (2016) worked on stances regarding target propositions, e.g., entities or events, as in-favor, against, or neither. Most commonly, stance detection has been defined with respect to a claim as agree, disagree, discuss or unrelated. Previous work mostly developed the models with rich hand-crafted features such as words, word embeddings, and sentiment lexicons Riedel et al. (2017); Baird et al. (2017); Hanselowski et al. (2018). More recently, Mohtarami et al. (2018) presented a mono-lingual and feature-light memory network for stance detection. In this paper, we built on this work to extend previous efforts in stance detection to a cross-lingual setting.

6 Conclusion and Future Work

We proposed an effective language adaptation approach to align class labels in source and target languages for accurate cross-lingual stance detection. Moreover, we investigated the behavior of our model in details and we have shown that it offers sizable performance gains over a number of competing approaches. In future, we will extend our language adaptation model to document retrieval and check-worthy claim detection tasks.


We thank the anonymous reviewers for their insightful comments. This research was supported in part by the Qatar Computing Research Institute, HBKU444This research is part of the Tanbih project (available at which aims to limit the effect of “fake news”, propaganda and media bias by making users aware of what they are reading. and DSTA of Singapore.


  • S. Baird, D. Sibley, and Y. Pan (2017) Talos targets disinformation with fake news challenge victory. Note: talos-fake-news-challenge.html Cited by: Table 2, 5th item, §5.
  • R. Baly, M. Mohtarami, J. Glass, L. Màrquez, A. Moschitti, and P. Nakov (2018) Integrating stance detection and fact checking in a unified corpus. In Proceedings of NAACL-HLT, New Orleans, LA, USA. Cited by: §1, Table 2, 2nd item, §3.
  • R. Bar-Haim, I. Bhattacharya, F. Dinuzzo, A. Saha, and N. Slonim (2017) Stance classification of context-dependent claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’17, pp. 251–261. Cited by: §1.
  • C. J. Becker, C. M. Christoudias, and P. Fua (2013) Non-linear domain adaptation with boosting. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), pp. 485–493. Cited by: §5.
  • A. Bergamo and L. Torresani (2010) Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems 23, pp. 181–189. Cited by: §5.
  • A. Bordes and J. Weston (2016) Learning end-to-end goal-oriented dialog. CoRR abs/1605.07683. Cited by: §2.1.
  • A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou (2017) Word translation without parallel data. arXiv preprint arXiv:1710.04087. Cited by: §3.
  • H. Daumé,III and D. Marcu (2006) Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research 26 (1), pp. 101–126. External Links: ISSN 1076-9757 Cited by: §5.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  • S. Dungs, A. Aker, N. Fuhr, and K. Bontcheva (2018) Can rumour stance alone predict veracity?. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, pp. 3360–3370. Cited by: §1.
  • Y. Ganin and V. Lempitsky (2015) Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning - Volume 37, ICML’15, pp. 1180–1189. Cited by: §5.
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016) Domain-adversarial training of neural networks. Journal of Machine Learning Research 17 (1), pp. 2096–2030. External Links: ISSN 1532-4435 Cited by: 7th item.
  • B. Gong, Y. Shi, F. Sha, and K. Grauman (2012) Geodesic flow kernel for unsupervised domain adaptation.. See conf/cvpr/2012, pp. 2066–2073. External Links: ISBN 978-1-4673-1226-4 Cited by: §5.
  • A. Hanselowski, A. PVS, B. Schiller, F. Caspelherr, D. Chaudhuri, C. M. Meyer, and I. Gurevych (2018) A retrospective analysis of the fake news challenge stance-detection task. In Proceedings of the International Conference on Computational Linguistics, COLING ’18, Santa Fe, NM, USA, pp. 1859–1874. Cited by: §1, Table 2, 3rd item, 4th item, §5.
  • V. Huynh and P. Papotti (2018) Towards a benchmark for fact checking with knowledge bases. In Companion Proceedings of the The Web Conference 2018, WWW ’18, Lyon, France, pp. 1595–1598. Cited by: §1.
  • D. Inkpen, X. Zhu, and P. Sobhani (2017) A dataset for multi-target stance detection.. See conf/eacl/2017-2, pp. 551–557. Cited by: §1.
  • Y. Kim (2014) Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’14, Doha, Qatar, pp. 1746–1751. Cited by: §2.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization.. CoRR abs/1412.6980. Cited by: §3.
  • E. Kochkina, M. Liakata, and A. Zubiaga (2018) All-in-one: multi-task learning for rumour verification. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, pp. 3402–3413. Cited by: §1.
  • D. M.J. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, M. Schudson, S. A. Sloman, C. R. Sunstein, E. A. Thorson, D. J. Watts, and J. L. Zittrain (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. External Links: ISSN 0036-8075 Cited by: §1.
  • M. Long, H. Zhu, J. Wang, and M. I. Jordan (2016) Unsupervised domain adaptation with residual transfer networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, USA, pp. 136–144. External Links: ISBN 978-1-5108-3881-9 Cited by: §5.
  • J. Ma, W. Gao, and K. Wong (2017) Detect rumors in microblog posts using propagation structure via kernel learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL ’17, Vancouver, Canada, pp. 708–717. Cited by: §1.
  • C. D. Manning, P. Raghavan, and H. Schütze (2008) Introduction to information retrieval. Cambridge University Press, New York, NY, USA. External Links: ISBN 0521865719, 9780521865715 Cited by: 1st item.
  • T. Mihaylova, P. Nakov, L. Marquez, A. Barron-Cedeno, M. Mohtarami, G. Karadzhov, and J. Glass (2018) Fact checking in community forums. In Proceedings of AAAI, New Orleans, LA, USA. Cited by: §1.
  • S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, and C. Cherry (2016) SemEval-2016 task 6: detecting stance in tweets. In Proceedings of SemEval, Berlin, Germany, pp. 31–41. Cited by: §5.
  • M. Mohtarami, R. Baly, J. Glass, P. Nakov, L. Màrquez, and A. Moschitti (2018) Automatic stance detection using end-to-end memory networks. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT ’18, New Orleans, LA, USA. Cited by: §1, §1, §2.1, §2.2, Table 2, §2, 6th item, §3, §5.
  • E. Montahaei, M. Ghorbani, M. S. Baghshah, and H. R. Rabiee (2018) Adversarial classifier for imbalanced problems. CoRR abs/1811.08812. Cited by: §3.
  • S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto (2017) Unified deep supervised domain adaptation and generalization. In IEEE International Conference on Computer Vision, Cited by: §5.
  • K. Muandet, D. Balduzzi, and B. Schölkopf (2013) Domain generalization via invariant feature representation. In Proceedings of the 30th International Conference on Machine Learning, S. Dasgupta and D. McAllester (Eds.), Proceedings of Machine Learning Research, Vol. 28, Atlanta, GA, USA, pp. 10–18. Cited by: §5.
  • M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer (2018) Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’2018, New Orleans, LA, USA, pp. 2227–2237. Cited by: §1.
  • K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum (2017) Where the truth lies: explaining the credibility of emerging claims on the Web and social media. In Proceedings of the Conference on World Wide Web, WWW ’17, Perth, Australia, pp. 1003–1012. External Links: ISBN 978-1-4503-4914-7 Cited by: §1.
  • B. Riedel, I. Augenstein, G. P. Spithourakis, and S. Riedel (2017) A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. ArXiv:1707.03264. Cited by: Table 2, 3rd item, §5.
  • K. Spärck Jones (2004) IDF term weighting and ir research lessons. Journal of documentation 60 (5), pp. 521–523. Cited by: §2.1.
  • S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus (2015) End-to-end memory networks. In Proceedings of NIPS, Montreal, Canada, pp. 2440–2448. Cited by: §1, §2.1, §2.
  • M. Tan, C. dos Santos, B. Xiang, and B. Zhou (2016) Improved representation learning for question answer matching. In Proceedings of ACL, Berlin, Germany, pp. 464–473. Cited by: §2.1.
  • J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal (2018) FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT ’18, New Orleans, LA, USA, pp. 809–819. Cited by: §1.
  • E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko (2017) Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), Cited by: §2.2.
  • A. Vlachos and S. Riedel (2014) Fact checking: task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, MD, USA, pp. 18–22. Cited by: §1.
  • S. Vosoughi, D. Roy, and S. Aral (2018) The spread of true and false news online. Science 359 (6380), pp. 1146–1151. External Links: ISSN 0036-8075 Cited by: §1.
  • W. Y. Wang (2017) “Liar, liar pants on fire”: a new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL ’17, Vancouver, Canada, pp. 422–426. Cited by: §1.
  • C. Xiong, S. Merity, and R. Socher (2016) Dynamic memory networks for visual and textual question answering. In Proceedings of the 33rd International Conference on Machine Learning, ICML ’16, New York, NY, USA, pp. 2397–2406. Cited by: §2.1.
  • G. Zarrella and A. Marsh (2016) MITRE at SemEval-2016 Task 6: transfer learning for stance detection. In Proceedings of SemEval, San Diego, CA, USA, pp. 458–463. Cited by: §5.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description