DEtection of TOXicity in comments In Spanish

The aim of the DETOXIS task is the detection of toxicity in comments posted in Spanish in response to different online news articles related to immigration. 

The DETOXIS task is divided into two related classification subtasks:

  • Subtask 1: Toxicity detection task is a binary classification task that consists of classifying the content of a comment as toxic (toxic=yes) or not toxic (toxic=no).

  • Subtask 2: Toxicity level detection task is a more fine grained classification task in which the aim is to identify the level of toxicity of a comment (0= not toxic; 1= mildly toxic; 2= toxic and 3: very toxic).

A comment is toxic when it attacks, threatens, insults, offends, denigrates or disqualifies a person or group of people on the basis of characteristics such as race, ethnicity, nationality, political ideology, religion, gender and sexual orientation, among others. This attack can be expressed in different ways –explicitly (through insult, mockery and inappropriate humor) or implicitly (for instance through sarcasm)– and at different levels of intensity, that is at different levels of toxicity (from impolite and offensive comments to the most aggressive, the latter being those comments that incite hate or even physical violence). We use toxicity as an umbrella term under which we include different definitions used in the literature to describe hate speech (Nockleby 2000, Waseem and Hovy 2016; Schmidt and Wiegand 2017, Davidson et al. 2017) and abusive (Nobata et al. 2016), aggressive (Kumar et al. 2018), toxic (Kolhaktar et al., 2019) or offensive (Zampieri et al., 2020) language. In fact, these different terms address different aspects of toxic language (Poletto el al., 2020).

The detection of toxicity, and especially its classification in different levels, is a difficult task because the identification of toxic comments can be determined not only by the proper linguistic content (what is being said and the way in which it is conveyed), but also by the contextual information (i.e. conversational thread) and the extralinguistic context, which is related to real-world knowledge.


Comment: Contra la desinformación datos. En el INE tenéis los datos de los inmigrantes. Delitos, cotizaciones, prestaciones, etc.


Toxicity level=0

"Against misleading data. In the INE you have data for immigrants. Crimes, contributions, subsidies, etc."

Comment: Esta gentuza se carga al país en dos telediarios


Toxicity level=1

"This rabble could destroy the country with two TV news programmes"

Comment: Lo que importa en realidad es sentirse mejor con uno mismo. Sumar karma. A los putos negros les pueden joder bien.


Toxicity level=2

"What really matters is feeling better about yourself. Accumulating karma. Fucking blacks can get fucked up pretty good."

Comment: A estos putos animales sarnosos que los encierren y tiren la llave.


Toxicity level=3

"With these mangy fucking animals, lock them up and throw away the key."


  • Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the eleventh international conference on web and social media, AAAI (pp. 512-515)

  • Kolhatkar, V., Wu, H., Cavasso, L., Francis, E., Shukla, K., & Taboada, M. (2019). The SFU opinion and comments corpus: A corpus for the analysis of online news comments. Corpus Pragmatics, (pp. 1-36).

  • Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2018). Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) (pp. 1-11).

  • Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2020). Evaluating aggression identification in social media. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 1-5).

  • Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. Proceedings of the 25th international conference on world wide web (WWW’16) (pp. 145-153).

  • Nockleby, J. T. (2000). Hate speech. Encyclopedia of the American constitution, 3(2), 1277-1279.

  • Poletto, F., Basile, V., Sanguinetti, M., Bosco, C. & Patti, V. (2020). Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, (https:/ 

  • Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. Proceedings of the Fifth International workshop on natural language processing for social media (pp. 1-10), Association for Computational Linguistics.

  • Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech detection on twitter. Proceedings of the NAACL student research workshop (pp. 88-93).

  • Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z. & Çöltekin, Ç. (2020). SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval 2020). Proceedings of the 14th international workshop on semantic evaluation. arXiv preprint arXiv:2006.07235.