Toward Safer Crowdsourced Content Moderation

Toward Safer Crowdsourced Content Moderation

BRANDON DANG The first two authors contributed equally.    MARTIN J. RIEDL and MATTHEW LEASE University of Texas at Austin

2 \acmNumber3 \acmArticle1 \articleSeq1 \acmYear2010 \acmMonth5

1 Introduction

While most user-generated content posted on social media platforms is benign, some image, video, and text posts violate terms of service and/or platform norms (e.g. due to nudity, obscenity, etc.). At the extreme, such content can include child pornography and violent acts, such as murder, suicide, and animal Krause and Grassegger (2016); Roe (2017). Ideally, algorithms would automatically detect and filter out such content, and machine learning approaches toward this end are certainly being pursued. Unfortunately, algorithmic accuracy remains today unequal to the task, thus making it necessary to fall back on human labor. While social platforms could ask their own users to help police such content, such practice is typically considered untenable since these platforms want to guarantee their users a safe, protected Internet experience within the confines of their curated platforms.

Consequently, the task of filtering out such content often falls today to a global workforce of paid human laborers who are willing to undertake the job of commercial content moderation Roberts (2014, 2016) to flag user-posted images which do not comply with platform rules. To more reliably moderate user content, social media companies hire internal reviewers, contract specialized workers from third parties, or outsource to online labor markets Gillespie (2018); Roberts (2016). While such work might be expected to be unpleasant, there is increasing awareness and recognition that long-term or extensive viewing of such disturbing content can incur significant health consequences for those engaged in such labor Chen (2012b); Ghoshal (2017), somewhat akin to working as a 911 operator in the USA, albeit one with potentially less institutional recognition and/or support for the detrimental mental health effects of the work. It is rather ironic, therefore, that precisely the sort of task one would most wish to automate (since algorithms could not be “upset” by viewing such content) is what the “technological advance” of Internet crowdsourcing is now enabling: shifting such work away from automated algorithms to more capable human workers Barr and Cabrera (2006).

In a court case scheduled to be heard at the King County Superior Court in Seattle, Washington in October 2018 Roe (2017), Microsoft was sued by two content moderators who said they developed post-traumatic stress disorder Ghoshal (2017). Recently, there has been an influx in academic and industry attention to these issues, as manifest in conferences organized on content moderation222atm-ucla2017.net333\ Civeris (2018). While this attention suggests increasing awareness and recognition of professional and research interest on the work of content moderators, few empirical studies have been conducted.

In this work, we aim to answer the following research question: How can we reveal the minimum amount of information to a human reviewer such that an objectionable image can still be correctly identified? Assuming such human labor will continue to be employed in order to meet platform requirements, we seek to preserve the accuracy of human moderation while making it safer for the workers who engage in this. Specifically, we experiment with blurring entire images to different extents such that low-level pixel details are eliminated but the image remains sufficiently recognizable to accurately moderate. We further implement tools for workers to partially reveal blurred regions in order to help them successfully moderate images that have been too heavily blurred. Beyond merely reducing exposure, putting finer-grained tools in the hands of the workers provides them with a higher-degree of control in limiting their exposure: how much they see, when they see it, and for how long.

Preliminary pilot data collection and analysis on Amazon Mechanical Turk (AMT), conducted as part of a class project, asked workers to moderate a set of “safe” images, collected judgment confidence, and queried workers regarding their expected emotional exhaustion or discomfort were this their full time job. We have further refined our approach based on these findings and next plan to proceed to primary data collection, which will measure how degree of blur and provided controls for partial unblurring affect the moderation experience with respect to classification accuracy and emotional wellbeing.

2 Related Work

Content-based pornography and nudity detection via computer vision approaches is a well-studied problem Ries and Lienhart (2012); Shayan et al. (2015). Violence detection in images and videos using computer vision is another active area of research Deniz et al. (2014); Gao et al. (2016). Hate speech detection is another common moderation task for humans and machines Schmidt and Wiegand (2017).

Crowdsourcing of privacy-sensitive materials also remains an open challenge Kittur et al. (2013). Several methods have been proposed in which workers interact with obfuscations of the original content, thereby allowing for the completion of the task at hand while still protecting the privacy of the content’s owners. Examples of such systems include those by Little and Sun Little and Sun (2011), Kokkalis et al. Kokkalis et al. (2013), Lasecki et al. Lasecki et al. (2013), Kaur et al. Kaur et al. (2017), and Swaminathan et al. Swaminathan et al. (2017). The crowdsourcing of obfuscated images has also been done in the computer vision community for the purpose of annotating object locations and salient regions von Ahn et al. (2006); Deng et al. (2013); Das et al. (2016).

Our experimental process and designs are inspired by Das et al. Das et al. (2016), in which crowd workers are shown blurred images and click regions to sharpen (i.e., unblur) them, incrementally revealing information until a visual question can be accurately answered.

3 Method

3.1 Dataset


The distribution of images across categories and types. Our final filtered dataset contains a total of 785 images. realistic synthetic sex and nudity 152 148 300 graphic content 123 116 239 safe content 108 138 246 383 402 785

We have collected images from Google Images depicting realistic and synthetic (e.g., cartoons) pornography, violence/gore, as well as “safe” content which we do not believe would be offensive to general audiences. We manually filtered out duplicates, as well as anything categorically ambiguous, too small or low quality, etc., resulting in a dataset of 785 images. Adopting category names from Facebook moderation guidelines for crowd workers on oDesk Chen (2012a, b), we label pornographic images as sex and nudity and violent/gory images as graphic content. Table 3.1 shows the final distribution of images across each category and type (i.e., realistic, synthetic).

3.2 AMT Human Intelligence Task (HIT) Design

Rather than only having workers indicate whether an image is acceptable or not, we task them with identifying additional information which could be useful for training automatic detection systems. Aside from producing richer labeled data, companies may also require moderators to report and escalate content depicting specific categories of abuse, such as child pornography. At the same time, we wish to protect the moderators from such exposure. We design our task as follows.

Figure 1: Images will be shown to workers at varying levels of obfuscation. Exemplified from left to right, we plan to blur images using a Gaussian filter with for different iterations of the experiment.

3.2.1 Moderation

Our HIT is divided into two parts. The first part is the moderation portion, in which workers are presented images to classify as belonging to the categories in Section 3.1. We use this set-up for six stages of the experiment with minor variations. Stage 1: we do not obfuscate the images at all; the results from this iteration serve as the baseline. Stage 2: we blur the images using a Gaussian with standard deviation . Stage 3: we increase the level of blur to . Figure 1 shows examples of images blurred at . Stage 4: we again use but additionally allow workers to click regions of images to reveal them them. Stage 5: similarly, we use but additionally allow workers to mouse-over regions of images to temporarily unblur them. Stage 6: workers are shown images at but can decrease the level of blur using a sliding bar.

By gradually increasing the level of blur, we reveal less and less information to the worker. While this may better protect workers from harmful images, we anticipate that this will also make it harder to properly evaluate the content of images. By providing unblurring features in later stages, we allow workers to reveal more information, if necessary, to complete the task.

3.2.2 Survey

We also ask workers to take a survey about their subjective experience completing the task. The survey contains question measuring various variables including positive and negative experience Diener et al. (2010) and affect Watson et al. (1988); Thompson (2007), emotional exhaustion Wharton (1993); Coates and Howe (2015), and perceived ease of use/perceived usefulness of the blurring interface Davis (1989); Venkatesh and Davis (2000); Davis et al. (1989). As our goal is to alleviate the psychological burden which may accompany content moderation, these measures will help us evaluate the extent to which obfuscating images successfully relieves workers.

4 Conclusion

By designing a system to help content moderators better complete their work, we seek to minimize possible risks associated with content moderation, while still ensuring accuracy in human judgments. Our experiment will mix blurred and unblurred adult content and safe images for moderation by human participants on AMT. This will enable us to observe the impact of obfuscation of images on participants’ content moderation experience with respect to moderation accuracy, usability measures, and worker comfort/wellness. Our overall goal is to develop methods to alleviate potentially negative psychological impact of content moderation and ameliorate content moderator working conditions.

Acknowledgments. This work is supported in part by National Science Foundation grant No. 1253413. Any opinions, findings, and conclusions or recommendations expressed by the authors are entirely their own and do not represent those of the sponsoring agencies.


  • (1)
  • Barr and Cabrera (2006) Jeff Barr and Luis Felipe Cabrera. 2006. AI gets a brain. Queue 4, 4 (2006), 24.
  • Chen (2012a) Adrian Chen. 2012a. Facebook releases new content guidelines, now allows bodily fluids. (2012).
  • Chen (2012b) Adrian Chen. 2012b. Inside Facebook’s outsourced anti-porn and gore brigade, where ‘camel toes’ are more offensive than ‘crushed heads’. (2012).
  • Civeris (2018) George Civeris. 2018. The new ’billion-dollar problem’ for platforms and publishers. Columbia Journalism Review (2018).
  • Coates and Howe (2015) Dominiek D. Coates and Deborah Howe. 2015. The design and development of staff wellbeing initiatives: Staff stressors, burnout and emotional exhaustion at children and young people’s mental health in Australia. Administration and Policy in Mental Health and Mental Health Services Research 42, 6 (2015), 655–663. DOI: 
  • Das et al. (2016) Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. 2016. Human attention in visual question answering: Do humans and deep networks look at the same regions? arXiv preprint arXiv:1606.03556 (2016).
  • Davis (1989) Fred D. Davis. 1989. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly (1989), 319–340.
  • Davis et al. (1989) Fred D. Davis, Richard P. Bagozzi, and Paul R. Warshaw. 1989. User acceptance of computer technology: a comparison of two theoretical models. Management Science 35, 8 (1989), 982–1003.
  • Deng et al. (2013) Jia Deng, Jonathan Krause, and Li Fei-Fei. 2013. Fine-grained crowdsourcing for fine-grained recognition. In 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 580–587.
  • Deniz et al. (2014) Oscar Deniz, Ismael Serrano, Gloria Bueno, and Tae-Kyun Kim. 2014. Fast violence detection in video. In 2014 International Conference on Computer Vision Theory and Applications (VISAPP), Vol. 2. IEEE, 478–485.
  • Diener et al. (2010) Ed Diener, Derrick Wirtz, William Tov, Chu Kim-Prieto, Dong-won Choi, Shigehiro Oishi, and Robert Biswas-Diener. 2010. New well-being measures: Short scales to assess flourishing and positive and negative feelings. Social Indicators Research 97, 2 (2010), 143–156. DOI: 
  • Gao et al. (2016) Yuan Gao, Hong Liu, Xiaohu Sun, Can Wang, and Yi Liu. 2016. Violence detection using oriented violent flows. Image and Vision Computing 48 (2016), 37–41.
  • Ghoshal (2017) Abhimanyu Ghoshal. 2017. Microsoft sued by employees who developed PTSD after reviewing disturbing content. The Next Web. (2017).
  • Gillespie (2018) Tarleton Gillespie. 2018. Governance of and by platforms. In SAGE Handbook of Social Media, Jean Burgess, Thomas Poell, and Alice Marwick (Eds.). SAGE.
  • Kaur et al. (2017) Harmanpreet Kaur, Mitchell Gordon, Yiwei Yang, Jeffrey P. Bigham, Jaime Teevan, Ece Kamar, and Walter S. Lasecki. 2017. CrowdMask: Using crowds to preserve privacy in crowd-powered systems via progressive filtering. In Proceedings of AAAI Human Computation (HCOMP).
  • Kittur et al. (2013) Aniket Kittur, Jeffrey V. Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work. ACM, 1301–1318.
  • Kokkalis et al. (2013) Nicolas Kokkalis, Thomas Köhn, Carl Pfeiffer, Dima Chornyi, Michael S. Bernstein, and Scott R. Klemmer. 2013. EmailValet: Managing email overload through private, accountable crowdsourcing. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work. ACM, 1291–1300.
  • Krause and Grassegger (2016) Till Krause and Hannes Grassegger. 2016. Inside Facebook. Sueddeutsche Zeitung. (2016).
  • Lasecki et al. (2013) Walter S. Lasecki, Young Chol Song, Henry Kautz, and Jeffrey P. Bigham. 2013. Real-time crowd labeling for deployable activity recognition. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work. ACM, 1203–1212.
  • Little and Sun (2011) Greg Little and Yu-An Sun. 2011. Human OCR: Insights from a complex human computation process. In ACM CHI Workshop on Crowdsourcing and Human Computation, Services, Studies and Platforms.
  • Ries and Lienhart (2012) Christian X. Ries and Rainer Lienhart. 2012. A survey on visual adult image recognition. Multimedia Tools and Applications 69 (2012), 661–688.
  • Roberts (2014) Sarah T. Roberts. 2014. Behind the screen: The hidden digital labor of commercial content moderation. Doctoral dissertation. University of Illinois at Urbana-Champaign.
  • Roberts (2016) Sarah T. Roberts. 2016. Commercial content moderation: Digital laborers’ dirty work. In The intersectional internet: Race, sex, class and culture online, Safiya Umoja Noble and Brendesha M. Tynes (Eds.). Peter Lang, 147–160. DOI: 
  • Roe (2017) Rebecca Roe. 2017. Dark shadows, dark web. In Keynote at All Things in Moderation: The People, Practices and Politics of Online Content Review – Human and Machine — UCLA December 6-7 2017. Los Angeles, CA.
  • Schmidt and Wiegand (2017) Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, 1–10.
  • Shayan et al. (2015) Jafar Shayan, Shahidan M Abdullah, and Sasan Karamizadeh. 2015. An overview of objectionable image detection. In 2015 International Symposium on Technology Management and Emerging Technologies (ISTMET). IEEE, 396–400.
  • Swaminathan et al. (2017) Saiganesh Swaminathan, Raymond Fok, Fanglin Chen, Ting-Hao Kenneth Huang, Irene Lin, Rohan Jadvani, Walter S. Lasecki, and Jeffrey P. Bigham. 2017. WearMail: On-the-go access to information in your email with a privacy-preserving human computation workflow. (2017).
  • Thompson (2007) Edmund R. Thompson. 2007. Development and validation of an internationally reliable short-form of the Positive and Negative Affect Schedule (PANAS). Journal of Cross-Cultural Psychology 38, 2 (2007), 227–242. DOI: 
  • Venkatesh and Davis (2000) Viswanath Venkatesh and Fred D. Davis. 2000. A theoretical extension of the technology acceptance model: Four longitudinal field studies. Management Science 46, 2 (2000), 186–204.
  • von Ahn et al. (2006) Luis von Ahn, Ruoran Liu, and Manuel Blum. 2006. Peekaboom: A game for locating objects in images. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 55–64.
  • Watson et al. (1988) David Watson, Lee Anna Clark, and Auke Tellegen. 1988. Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology 54, 6 (1988), 1063–1070. DOI: 
  • Wharton (1993) Amy S. Wharton. 1993. The affective consequences of service work: Managing emotions on the job. Work and Occupations 20, 2 (1993), 205–232.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description