Hate Speech Detection in Algerian Dialect Using Deep Learning: Background

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


(1) Dihia LANASRI, OMDENA, New York, USA;

(2) Juan OLANO, OMDENA, New York, USA;

(3) Sifal KLIOUI, OMDENA, New York, USA;

(4) Sin Liang Lee, OMDENA, New York, USA;

(5) Lamia SEKKAI, OMDENA, New York, USA.

2 Background

Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic De Gibert et al. [2018].

2.1 Hate speech

According to Al-Hassan and Al-Dossari [2019] hate speech is categorized into five categories: (1) gendered hate speech, including any form of misogyny and sexism; (2) religious hate speech including any religious discrimination, such as Islamic sects, anti-Christian, etc.; (3) racist hate speech including any racial offense or tribalism, and xenophobia; (4) disability including any sort of offense to an individual suffering from health problems; and (5) political hate speech can refer to any abuse and offense against politicians Guellil et al. [2022].

2.2 Algerian Dialect and Arabic Languages

Arabic is the official language of 25 countries[2] . More than 400 million people around the world speak this language. Arabic is also recognized as the 4th most-used language on the Internet Boudad et al. [2018]. Arabic is classified into three categories Habash [2022]: (1) Classical Arabic (CA), which is the form of the Arabic language used in literary texts. The Quran is considered the highest form of CA text Sharaf and Atwell [2012]. (2) Modern Standard Arabic (MSA) is used for writing and formal conversations. (3) Dialectal Arabic is used in daily life communication, informal exchanges, etc. Boudad et al. [2018] like the Algerian dialect, Tunisian dialect, etc.