ECOL: Early Detection of COVID Lies Using Content, Prior Knowledge and Source Information

Ecol: Early Detection of CoVid Lies Using Content, Prior Knowledge and Source Information

Abstract

Social media platforms are vulnerable to fake news dissemination, which causes negative consequences such as panic and wrong medication in the healthcare domain. Therefore, it is important to automatically detect fake news in an early stage before they get widely spread. This paper analyzes the impact of incorporating content information, prior knowledge, and credibility of sources into models for the early detection of fake news. We propose a framework modeling those features by using BERT language model and external sources, namely Simple English Wikipedia and source reliability tags. The conducted experiments on CONSTRAINT datasets demonstrated the benefit of integrating these features for the early detection of fake news in the healthcare domain.

Keywords:
Fake news detection Deep learning Prior knowledge

1 Introduction

Social media is replacing traditional media as a source of information due to the ease of access, fast sharing, and the freedom to create content. However, social media is also responsible for spreading the massive amount of fake news [31]. Fake news propagation can manipulate significant events such as political elections or severely damage the society during crisis [14]. For example, a rumor that initially occurred in a UK tabloid paper claimed that neat alcohol could cure COVID-19. As a consequence of the spread of this rumor, hundreds of Iranians have lost their lives due to alcohol poisoning3. Therefore, it is crucial to detect potentially false claims early before they reach large audiences and cause damage.

Since the U.S presidential elections in 2016, tremendous efforts have been devoted by the research community to automate fake news detection. Most prior studies rely on leveraging propagation information, user engagement and content of news/social media posts [39, 26, 11]. However, the methods relying on propagation information [37] and/or on user engagement [16, 3, 2, 27] are not applicable for detecting fake news at an early stage since they are only available when the news starts disseminating. The methods solely based on content (e.g [2, 33]) could be misguided by claims that require additional context for their interpretation. For instance, the post in Figure 1 may sound very plausible for readers who know the relationship between the politicians mentioned in the post. However, the source is a satiric website which indicates that the post is fake.

Commonly, a lot of fake claims/news that are partially sharing similar content occur in different sources in a relatively long time-span. For example, in the healthcare domain, the most common fake claims are about unproven and alternative cures (e.g using alcohol) against diseases [28, 34]. These types of fake news have also been observed during COVID-19 [18]. The assumption is that early published claims are fact-checked and can be employed to detect later ones. Therefore, investigating previously published claims/news can provide important information in determining the truthfulness of posts [12, 25]. Specifically, encoding previously published fake news in healthcare domain as prior knowledge, content and source information could help detecting posts that would potentially be missed by solely content-based methods.

Figure 1: System architecture of ECOL when metadata is unavailable.

In this paper, we investigate and analyse those intrinsic and extrinsic features to detect fake news in the healthcare domain. To this end, we propose a Neural Network model that integrates (1) a contextual representation of the news content, (2) a representation of the relationship to similar validated fake news and (3) a source representation the embeds the reliability of the source. The main contributions of this paper are as follows:

  • We introduced a classification framework that models prior knowledge, content and source information by using BERT[7] and reliability tags to predict the truthfulness of social media posts in the healthcare domain. We share our source code and trained models on Github4.

  • To evaluate the effectiveness of proposed model, we conducted an extensive experiment on the CONSTRAINT dataset. According to the obtained F1 score, ECOL is ranked 14 among 167 submissions at CONSTRAINT competition [21].

The rest of the paper is organized as follows. Section 2 briefly presents related work. Section 3 describes the proposed framework. Section 4 presents the dataset, baselines, ablation models and implementation details. Section 5 presents and discusses the results of the experiment. Finally, Section 6 concludes the paper and gives insights on future work.

2 Related Work

This section presents related work of textual content-based methods and approaches using external information to assess truthfulness.

2.1 Textual Content based Fake News Detection on Social Media

Textual content-based methods for fake news detection on social media [39, 26, 11] vary from traditional machine learning models [3, 13, 38] to neural networks [23, 2, 7]. For instance, the methods [3, 13] leverage features such as the sentiment and metadata information (e.g. replies, likes of social media posts) in addition to statistics derived from both post and metadata. Zhou et al. [38] investigate the features derived by the theories in social and forensic psychology. As examples of neural network models, CNN [33], RNN [4] and most recently context-aware language models such as ELMo [23, 2] and BERT [9, 7] have also been used. While CNN, RNN models ignore the context information, BERT and ELMo can learn the different meanings of the words depending on the context. Among context-aware language models, BERT has shown state-of-art results in many NLP tasks [7]. Therefore, in our study, we encode content information with BERT.

2.2 Extrinsic Features for Determining Truthfulness of Claims

As an extrinsic feature, encoding the top relevant evidence pages retrieved by commercial search engines (e.g. Google) has been a widely preferred approach [29, 1, 15] to determine the truthfulness of claims or posts. However, Augenstein et al. [1] stated that this method has a drawback of affecting veracity assessments when the results change over time. Firstly, this drawback could prevent reproducible results. Secondly, it would not be applicable to evaluate the posts which cannot be supported or denied with evidences when initially occurred.

Another extrinsic feature is leveraging the information of previously analyzed claims. Claim similarity between previously fact-checked claims in the political domain has been studied as part of fact-checking system [12, 6, 36] or as an information retrieval task [25]. Those studies aim to find claims or posts reporting about the same event. However, we aim to learn the similarity of the posts with previously detected fake news in the healthcare domain, not necessarily reporting about the same event.

Lastly, the credibility of user-profiles [16, 37] and source websites [17, 8, 20, 10] are also strong extrinsic features for determining truthfulness at an early stage [19]. To detect rumors and fake news on social media, Yuan et al. [37] used user credibility as weak signals in their graph-based neural network model. Li et al. [16] combined the credibility of users with post features and post embeddings for rumor detection. These two studies require propagation and metadata information to encode the aforementioned features. Other studies [17, 20, 10] focus on determining the credibility of sources that mostly report political news. Using only the credibility information of political news sources could be limited. Therefore, we leverage the content of the Simple English Wikipedia and source credibility information by Gruppi et al. [10] in our study.

In summary, ECOL utilizes the content of the post, its similarity to prior knowledge, and the credibility of its embedded source URLs, to detect their truthfulness early.

3 Ecol Approach

The overall architecture of ECOL framework is illustrated in Figure 1. Firstly, as a pre-processing step, the framework (1) lowers all cased words, (2) fixes the Unicode errors, (3) translates the text to the closest ASCII representation and (4) replaces exclusive content with tags. Specifically, URLs are replaced with with <URL>, emails with <EMAIL>, numbers with <NUMBER>, digits with <DIGIT>, phone numbers with <PHONE> and currency symbols with <CUR>. Secondly, the framework encodes (1) content information from solely posts (Section 3.1), (2) relation with top 10 relevant fake news in health domain (Section 3.2) and (3) the sources, that are embedded within the post, by concatenating reliability tags and their Simple Wiki descriptions (Section 3.3). Next, it concatenates the encoded features and feeds them into a fully connected layer. Finally, a softmax layer classifies the features as fake or real to express their truthfulness.

3.1 Content (C)

In this unit, ECOL tries to capture the writing style of fake and true posts. To well learn the content information, we encode the texts with BERT [7] which is a context-aware language model based on the transformer network and has been pre-trained on massive text corpus [30]. BERT learns specific NLP tasks after fine-tuning its pre-trained models. In order to obtain the post representations, we encode the input sequences of the post with the uncased base version of pre-trained BERT by using the transformers library [35]. Uncased base BERT consists of 12 layers and 12 attention heads and outputs 768-dimensional vectors for each word in the post. The first token of the input sequences is called [CLS] and indicates the classification label. The final hidden state of the [CLS] is used as a content representation. We tune the maximum size of texts to 128 and padded short texts.

3.2 Prior Knowledge (PK)

To leverage the relation of news event with previously published and proved fake news, we encoded a post’s relation with a set of similar fake news disseminated before COVID-19. Given a post as a query, an ad-hoc semantic search engine retrieves the top 10 fake news from a repository indexed with fake news in the health domain. To obtain a fake news vector (FN), we encoded each retrieved news with the BERT and took their average. Afterwards, we computed the cosine similarity between FN and the post (P) as follows , where is an one dimensional relatedness vector.

3.2.1 Ad Hoc Search and Indexing

To obtain the similar fake news, we used Elasticsearch 5 to retrieve validated fake news from FakeHealth [5] which is a dataset containing real and fake news stories and releases published in 2009 and 2018 from the health fatchecking organization HealthNewsReview6. The dataset covers diseases such as cancer, alzheimer, etc.
We indexed the title and article of the news in text format and to add the ability of semantic retrieval to the search engine, we encode title and the article of news also with the pre-trained sentence-BERT [24]. The search engine retrieves the top 10 documents whose fields match and have high cosine similarity with the query. If the number of retrieved documents is smaller than 10, the list of the documents is appended with an empty strings.

3.3 Source (S)

To encode information of the sources, for each source, we first unshortened any shortened links, such as the URL in Figure 1. Then, we extract the name of the source (e.g thespoof). We retrieve then the reliability and Simple Wiki description of the source and vectorized source information by concatenating the retrieved information.

3.3.1 Reliability

NELA 2019 is a dataset containing 260 news sources proposed by Gruppi et al [10]. The dataset contains source reliability labels from various assessment websites such as Media Bias/Fact Check (MBFC)7, Politifact8, etc. Moreover, The authors aggregated the labels from MBFC by assigning a label unreliable to sources with low factual reporting or listed as conspiracy/pseudoscience source. Similarly, they assigned reliable to sources with high factual reporting. We determined the source reliability by combining the aggregated labels and satire sources from MBFC. In the end, the source types that we used for reliability are reliable, unreliable, and satire. We assign the na (not available) tag to the sources that do not occur in NELA 2019. Afterwards, we vectorize the reliability of each source with one hot encoder. For the posts that do not have URLs or the number of URLs less than 5 URLs, we append the source lists with zero vectors of a size equal to the length of the reliability tags.

Simple Wiki Source Descriptions

The Simple English Wikipedia aims to provide access to an English encyclopedia for non-native English speakers and children. The entity descriptions of Simple English Wikipedia can be a clue for the trustworthiness of the sources. We downloaded the Simple English Wikipedia, February 2019 dump9 to ensure that the contents were written before COVID-19. Using the tool Wiki Extractor10, we constructed a dictionary mapping entities to their descriptions. For simplicity, we ignored ambiguous entities that have more than one wiki pages and mapped the description of each source in the posts if a key of the dictionary exactly matched the source name. Afterwards, we encoded each source descriptions as BERT representations.

4 Experiments

4.1 Dataset

The CONSTRAINT dataset [22] contains fake and real news about COVID-19 in the form of social media posts. Fake news samples were collected from various fact-checking websites and tools such as Politifact, IFCN chatbot. Real news samples were collected from verified Twitter accounts and manually checked by the organizers. The dataset is split into train, dev, and test splits. Table 1 presents statistical details of the datasets.

Real Fake Total
w w/o w w/o
Train 2321 1039 1002 2058 6420
Dev 780 340 327 693 2140
Test 779 341 319 701 2140
Table 1: Statistical details of the task’s dataset [22]. w: post with links, w/o: posts without links

4.2 Baselines

We compared ECOL framework against the baseline classifiers provided by the organizers, which are Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boost, and Decision Trees (DT). The baseline classifiers are trained on term frequency-inverse document frequency (tf-idf) features.

4.3 Models

We compare and analyse the following variations of ECOL framework models:

  • C uses solely content information as feature.

  • PK uses solely prior knowledge as feature.

  • C_PK uses concatenation of content and prior knowledge as feature.

  • C_S uses concatenation of content and source as feature.

  • C_PK_S uses concatenation of content, source and prior knowledge.

4.4 Implementation

We implemented ECOL models using PyTorch Lightning11. We trained the models with 42, 0, 36 random seeds, three epochs, and one batch size on a NVIDIA TITAN RTX 16 GB GPU.

5 Results and Discussion

We present the experimental results on development and test sets in Table 2. We report Precision (P), Recall (R), F1 scores per class, and accuracy, weighted P, R, and F1 scores as the models’ overall performance. C, PK, C_S, C_PK_S average the predictions by the models trained with 42, 36, 0 as random seeds. The other models use a random seed of 42, which gave the highest F1 scores in our experiments. We entered the CONSTRAINT shared task with three entries: an average over the three C_PK_S models and the two C_PK_S models with the highest F1 scores (random seeds are 42, 36). The best performing model with random seed 42 ranked 14 among 167 submissions [21].

Set Model Fake Real Overall
P R F1 P R F1 Acc P R F1
Dev SVM 92.07 94.41 93.22 94.79 92.59 93.68 93.46 93.48 93.46 93.46
LR 91.07 94.02 92.52 94.39 91.61 92.98 92.76 92.79 92.76 92.75
GB 83.41 90.20 86.67 90.36 83.66 86.88 86.78 87.03 86.78 86.77
DT 85.53 83.43 84.47 85.24 87.14 86.18 85.37 85.42 85.37 85.38
C 98.12 97.06 97.59 97.35 98.30 97.82 97.71 97.72 97.71 97.71
PK 97.69 95.20 96.43 95.72 97.95 96.82 96.63 96.67 96.64 96.64
C_PK 98.99 95.78 97.36 96.27 99.11 97.67 97.52 97.57 97.52 97.53
CS 98.39 95.98 97.17 96.42 98.57 97.48 97.34 97.37 97.34 97.34
C_PK_S 98.51 97.25 97.88 97.53 98.66 98.09 97.99 98.00 97.99 97.99
C 99.28 95.20 97.20 95.78 99.38 97.55 97.38 97.47 97.38 97.39
PK 99.17 93.53 96.27 94.40 99.29 96.78 96.54 96.70 96.54 96.55
C_PK 99.49 94.71 97.04 95.38 99.55 97.42 97.24 97.35 97.24 97.24
C_S 99.49 95.10 97.24 95.71 99.55 97.59 97.43 97.52 97.43 97.43
C_PK_S 99.09 96.08 97.56 96.52 99.20 97.84 97.71 97.75 97.71 97.71
Test SVM 92.20 93.92 93.05 94.37 92.77 93.56 93.32 93.33 93.32 93.32
LR 90.08 93.43 91.72 93.81 90.62 92.19 91.96 92.01 91.96 91.96
GB 83.39 90.59 86.84 90.70 83.57 86.99 86.92 87.20 86.92 86.91
DT 85.39 84.22 84.80 85.80 86.88 86.34 85.61 85.62 85.61 85.61
C 98.21 96.96 97.58 97.26 98.39 97.83 97.71 97.72 97.71 97.71
PK 97.78 95.20 96.47 95.73 98.04 96.87 96.68 96.72 96.68 96.68
C_PK 99.29 95.59 97.40 96.11 99.38 97.72 97.57 97.64 97.57 97.57
CS 98.69 96.08 97.37 96.51 98.84 97.66 97.52 97.56 97.52 97.53
C_PK_S 99.10 96.96 98.02 97.29 99.20 98.23 98.13 98.15 98.13 98.13
C 99.49 94.80 97.09 95.46 99.55 97.47 97.29 97.40 97.30 97.29
PK 98.56 94.02 96.24 94.77 98.75 96.72 96.50 96.60 96.50 96.50
C_PK 99.38 93.82 96.52 94.65 99.46 97.00 96.78 96.93 96.78 96.78
C_S 99.79 94.90 97.29 95.56 99.82 97.64 97.48 97.59 97.48 97.48
C_PK_S 99.59 95.29 97.39 95.88 99.64 97.72 97.57 97.66 97.57 97.57
Table 2: Precision, recall and F1-score of the baseline and proposed models, trained on the CONSTRAINT datasets. Highlighted scores indicate the highest values for each metric.
(a) Fake news posts with and without links
(b) Real news with and without links
Figure 2: F1 Scores of the models when predicting posts with (blue color) and without links (red color).

By applying a T-test at the 0.01 significance level, we observe that the proposed models significantly outperformed the baselines. When we compare the content-based models (C and C) with the other proposed models, we first see that the prior knowledge and content information (C_PK) complements the source information (C_S). Moreover, the prior knowledge and content information helps to identify false news, but source information helps in identifying real news. Therefore, among the proposed models, C_PK_S and C_PK_S achieve the highest F1 scores by balancing the predictions towards real and fake news. However, the improvement is not significant. For instance, C misclassified only 49 samples while the C_PK_S, classified 9 samples more correctly.

PK and C_PK which incorporate the prior knowledge are the least successful models among the proposed models. We observed that the indexing method for the retrieval unit yields false-positive predictions. However, the models also outperformed the official baselines which implies that prior knowledge could be useful for fake news detection. For better healthcare retrieval, we plan to improve the indexing schema with the semantic concepts that define the health claim types such as treatment, alternative medicine.

We also analyzed how the presence of links in posts change the model predictions and present F1 scores of the models by grouping them into posts with and without links in Figure 2. The presence of links in fake news drastically degrades the performance of the models (Figure (a)a). For example, while C_S scores the posts with the links as 98.77, its F1 score is reduced to 93.22. However, encoding source information into the models (C_PK_S and C_S) improves identifying fake news posts with links. When we analyse the links, we see that they are Twitter accounts, medical websites and delete links that are not present in Simple English Wikipedia nor reliability dictionary. However, we found some samples in the test set, which have links to a fact-checking website (Politifact) but were annotated as fake, potentially yielding false predictions.

As seen in Figure (b)b, the content information is the only key feature for identifying real news. Prior knowledge and source information could not improve the prediction of real news posts with links. When we analyze real news posts that were misclassified by the models, we see that although the posts are written in reporting language, they also contain judgemental language. For example, Coronavirus: Donald Trump ignores COVID-19 rules with ’reckless and selfish’ indoor rally [URL] might confuse the models by combining the two language types.

6 Conclusion

In this paper, we presented a promising framework for the early detection of fake news on social media. The framework encodes content, prior knowledge, and credibility of sources from the URL links in the posts. We analyzed the impact of each encoded information on the models to detect fake news in the healthcare domain. We observed that using three perspectives could lead to precisely distinguish between fake and real news. In future work, we will improve the source linking by using structured data such as Wikidata [32] in order to encode more source knowledge. For a better retrieval in the healthcare domain, we plan also to index prior knowledge by categorizing it into semantic concepts such as cure, treatment and symptoms.

Footnotes

  1. email: {ibaris,boukhers}@uni-koblenz.de
  2. email: {ibaris,boukhers}@uni-koblenz.de
  3. https://www.independent.co.uk/news/world/middle-east/iran-coronavirus-methanol-drink-cure-deaths-fake-a9429956.html
  4. https://github.com/isspek/FakeNewsDetectionFramework
  5. docker.elastic.co/elasticsearch/elasticsearch:7.6.1
  6. https://www.healthnewsreview.org/
  7. https://mediabiasfactcheck.com/
  8. https://www.politifact.com/
  9. https://archive.org/details/simplewiki-20190201
  10. https://github.com/attardi/wikiextractor
  11. https://pytorch-lightning.readthedocs.io/en/0.7.1/introduction_guide.html

References

  1. I. Augenstein, C. Lioma, D. Wang, L. C. Lima, C. Hansen, C. Hansen and J. G. Simonsen (2019) MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. In EMNLP/IJCNLP (1), pp. 4684–4696. Cited by: §2.2.
  2. I. Baris, L. Schmelzeisen and S. Staab (2019) CLEARumor at semeval-2019 task 7: convolving elmo against rumors. In SemEval@NAACL-HLT, pp. 1105–1109. Cited by: §1, §2.1.
  3. C. Castillo, M. Mendoza and B. Poblete (2011) Information credibility on twitter. In WWW, pp. 675–684. Cited by: §1, §2.1.
  4. T. Chen, X. Li, H. Yin and J. Zhang (2018) Call attention to rumors: deep attention based recurrent neural networks for early rumor detection. In PAKDD (Workshops), Lecture Notes in Computer Science, Vol. 11154, pp. 40–52. Cited by: §2.1.
  5. E. Dai, Y. Sun and S. Wang (2020) Ginger cannot cure cancer: battling fake health news with a comprehensive data repository. In ICWSM, pp. 853–862. Cited by: §3.2.
  6. R. Denaux and J. M. Gómez-Pérez (2020) Linked credibility reviews for explainable misinformation detection. In ISWC (1), Lecture Notes in Computer Science, Vol. 12506, pp. 147–163. Cited by: §2.2.
  7. J. Devlin, M. Chang, K. Lee and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pp. 4171–4186. Cited by: 1st item, §2.1, §3.1.
  8. D. Esteves, A. J. Reddy, P. Chawla and J. Lehmann (2018) Belittling the source: trustworthiness indicators to obfuscate fake news on the web. CoRR abs/1809.00494. Cited by: §2.2.
  9. M. Fajcik, P. Smrz and L. Burget (2019) BUT-FIT at semeval-2019 task 7: determining the rumour stance with pre-trained deep bidirectional transformers. In SemEval@NAACL-HLT, pp. 1097–1104. Cited by: §2.1.
  10. M. Gruppi, B. D. Horne and S. Adali (2020) NELA-GT-2019: A large multi-labelled news dataset for the study of misinformation in news articles. CoRR abs/2003.08444. Cited by: §2.2, §3.3.
  11. B. Guo, Y. Ding, L. Yao, Y. Liang and Z. Yu (2020) The future of false information detection on social media: new perspectives and trends. ACM Comput. Surv. 53 (4), pp. 68:1–68:36. Cited by: §1, §2.1.
  12. N. Hassan, G. Zhang, F. Arslan, J. Caraballo, D. Jimenez, S. Gawsane, S. Hasan, M. Joseph, A. Kulkarni, A. K. Nayak, V. Sable, C. Li and M. Tremayne (2017) ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10 (12), pp. 1945–1948. Cited by: §1, §2.2.
  13. S. Kwon, M. Cha, K. Jung, W. Chen and Y. Wang (2013) Prominent features of rumor propagation in online social media. In ICDM, pp. 1103–1108. Cited by: §2.1.
  14. D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook and D. Rothschild (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. Cited by: §1.
  15. Q. Li and W. Zhou (2020) Connecting the dots between fact verification and fake news detection. In COLING, pp. 1820–1825. Cited by: §2.2.
  16. Q. Li, Q. Zhang and L. Si (2019) Rumor detection by exploiting user credibility information, attention and multi-task learning. In ACL (1), pp. 1173–1179. Cited by: §1, §2.2.
  17. M. Mensio and H. Alani (2019) News source credibility in the eyes of different assessors. In TTO, Cited by: §2.2.
  18. S. B. Naeem, R. Bhatti and A. Khan (2020) An exploration of how fake news is taking over social media and putting public health at risk. Health Information & Libraries Journal. Cited by: §1.
  19. P. Nakov (2020) Can we spot the ”fake news” before it was even written?. CoRR abs/2008.04374. Cited by: §2.2.
  20. J. Nørregaard, B. D. Horne and S. Adali (2019) NELA-GT-2018: A large multi-labelled news dataset for the study of misinformation in news articles. In ICWSM, pp. 630–638. Cited by: §2.2.
  21. P. Patwa, M. Bhardwaj, V. Guptha, G. Kumari, S. Sharma, S. PYKL, A. Das, A. Ekbal, M. S. Akhtar and T. Chakraborty (2021) Overview of constraint 2021 shared tasks: detecting english covid-19 fake news and hindi hostile posts. In Proceedings of the First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (CONSTRAINT), Cited by: 2nd item, §5.
  22. P. Patwa, S. Sharma, P. Srinivas, V. Guptha, G. Kumari, Md. S. Akhtar, A. Ekbal, A. Das and T. Chakraborty (2020) Fighting an infodemic: COVID-19 fake news dataset. CoRR abs/2011.03327. Cited by: §4.1, Table 1.
  23. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer (2018) Deep Contextualized Word Representations. In NAACL-HLT, pp. 2227–2237. Cited by: §2.1.
  24. N. Reimers and I. Gurevych (2019) Sentence-bert: sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP (1), pp. 3980–3990. Cited by: §3.2.
  25. S. Shaar, N. Babulkov, G. D. S. Martino and P. Nakov (2020) That is a known lie: detecting previously fact-checked claims. In ACL, pp. 3607–3618. Cited by: §1, §2.2.
  26. K. Shu, A. Sliva, S. Wang, J. Tang and H. Liu (2017) Fake news detection on social media: A data mining perspective. SIGKDD Explor. 19 (1), pp. 22–36. Cited by: §1, §2.1.
  27. K. Shu, G. Zheng, Y. Li, S. Mukherjee, A. H. Awadallah, S. Ruston and H. Liu (2020) Leveraging multi-source weak social supervision for early detection of fake news. CoRR abs/2004.01732. Cited by: §1.
  28. L. O. The (2018) Oncology,” fake” news, and legal liability.. The Lancet. Oncology 19 (9), pp. 1135. Cited by: §1.
  29. J. Thorne, A. Vlachos, O. Cocarascu, C. Christodoulopoulos and A. Mittal (2018) The fact extraction and verification (FEVER) shared task. CoRR abs/1811.10971. Cited by: §2.2.
  30. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin (2017) Attention is all you need. Advances in neural information processing systems 30, pp. 5998–6008. Cited by: §3.1.
  31. S. Vosoughi, D. Roy and S. Aral (2018) The spread of true and false news online. Science 359 (6380), pp. 1146–1151. Cited by: §1.
  32. D. Vrandečić and M. Krötzsch (2014) Wikidata: a free collaborative knowledgebase. Communications of the ACM 57 (10), pp. 78–85. Cited by: §6.
  33. W. Y. Wang (2017) ”Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In ACL (2), pp. 422–426. Cited by: §1, §2.1.
  34. P. M. Waszak, W. Kasprzycka-Waszak and A. Kubanek (2018) The spread of medical fake news in social media–the pilot quantitative study. Health policy and technology 7 (2), pp. 115–118. Cited by: §1.
  35. T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest and A. M. Rush (2020-10) Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, pp. 38–45. External Links: Link Cited by: §3.1.
  36. V. Woloszyn, F. Schaeffer, B. Boniatti, E. G. Cortes, S. Mohtaj and S. Möller (2020) Untrue.news: A new search engine for fake stories. CoRR abs/2002.06585. Cited by: §2.2.
  37. C. Yuan, Q. Ma, W. Zhou, J. Han and S. Hu (2020) Early detection of fake news by utilizing the credibility of news, publishers, and users based on weakly supervised learning. In COLING, pp. 5444–5454. Cited by: §1, §2.2.
  38. X. Zhou, A. Jain, V. V. Phoha and R. Zafarani (2020) Fake news early detection: a theory-driven model. Digital Threats: Research and Practice 1 (2), pp. 1–25. Cited by: §2.1.
  39. X. Zhou and R. Zafarani (2020-09) A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput. Surv. 53 (5). External Links: ISSN 0360-0300, Link, Document Cited by: §1, §2.1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
425848
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description