Hate Speech Detection in Algerian Dialect Using Deep Learning: Experiments and Resultsby@escholar

Hate Speech Detection in Algerian Dialect Using Deep Learning: Experiments and Results

tldt arrow

Too Long; Didn't Read

In this paper, we proposed a complete end-to-end natural language processing (NLP) approach for hate speech detection in the Algerian dialect.
featured image - Hate Speech Detection in Algerian Dialect Using Deep Learning: Experiments and Results
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


(1) Dihia LANASRI, OMDENA, New York, USA;

(2) Juan OLANO, OMDENA, New York, USA;

(3) Sifal KLIOUI, OMDENA, New York, USA;

(4) Sin Liang Lee, OMDENA, New York, USA;

(5) Lamia SEKKAI, OMDENA, New York, USA.

5 Experiments and Results

To train and evaluate our models, we used TensorFlow or Pytorch deep learning frameworks. We used Google Colab and Kaggle GPUs to accelerate the experiments. In table ??, we will provide the detailed results that we obtained.

Linear Support Vector Classifier (LinearSVC): The LinearSVC model offered a competitive accuracy but struggled with the recall for the hate speech class. The precision and recall trade-off indicates possible challenges in differentiating between the subtle nuances of hate and non-hate speech in the dialect. The model exhibited high precision and recall for class 0 but showed room for improvement for class 1, particularly in terms of recall. This suggests that while the model is quite good at identifying class 0, it could be improved for identifying class 1.

gzip + KNN: One of the worst models in terms of capabilities, although it is diverging from the baseline it is unclear whether these results will hold in out of distribution cases, especially when we know that there is no underlying process in the model that captures semantic representations of the documents.

Dziribert-FT-HEAD: the model exhibits a noteworthy precision score, signifying its accuracy in correctly classifying instances as hate speech or not. However, the relatively lower recall score suggests that it missed identifying some hate speech instances. This discrepancy might be attributed to the model’s lack of specialized handling for the nuances of the Algerian dialect, potentially causing it to overlook certain hate speech patterns unique to that context.

Despite this, the model’s overall accuracy remains commendably high, indicating its robust performance in making accurate predictions. Additionally, the balanced precision and recall values underline its ability to strike a reasonable trade-off between minimizing false positives and false negatives, a crucial aspect in hate speech detection.

The F1 Score, being the harmonic mean of precision and recall, further validates the model’s capacity to effectively identify positive samples while avoiding misclassification of negative ones. The model consistently demonstrates strong performance across multiple evaluation metrics, especially in terms of accuracy and F1 score. These results reaffirm the practicality and effectiveness of employing deep learning techniques for the challenging task of hate speech detection.

LSTM and BiLSTM with FastText-DZ: Unfortunately, the results of this model are among the worst ones. The literature shows the strength of LSTM and BiLSTM in this kind of NLP project, but this is not the case for this project. The low precision is due to the incapability of the model to classify correctly the hate class. FastText is a good word embedding model that captures the context and semantics of a document. However, in this case, it does not perform well because of the fine-tuning done where we took an Arabic FastText and fine-tune it on Algerian dataset written in Arabic characters.

DZiriBert with Peft+LoRA: We utilize both PEFT and LoRA to fine-tune DZiriBERT, a model specifically adapted to the Algerian dialect. By employing these techniques, we were able to create a highly effective and efficient model for hate speech detection in the Algerian dialect while keeping computational costs at a minimum.

Multilingual-E5-base Fine Tuned and sbert-distill-multilingual Fine Tuned: The outcomes obtained from these models are noteworthy; nonetheless, their performances pale when compared with the parameter-efficient fine-tuning on the DZiriBERT model.

DzaraShield: The results returned by this model are satisfying considering the relatively low quantity of data it was finetuned on, this exhibits further that the pretraining plays the major role on downstream takes such as classification in our case, especially that the base model is an encoder only architecture which captures contextual information from the input data, making it useful for a wide range of text classification tasks.

AraT5v2-HateDetect: The results are slightly inferior to Dzarashield. One possible explanation is the increased complexity of the architecture when compared to the Dzarabert base model. Consequently, fine-tuning becomes a more intricate task due to the larger hyperparameter search space and the limited resources in terms of computing power and data availability. As a result, it is reasonable to expect that these models would perform similarly in real-world scenarios.

5.1 Results Discussion

The DzaraShield model has demonstrated remarkable capability in detecting hate speech in the Algerian dialect. Its outstanding precision score highlights its reliability in accurately identifying instances of hate speech. Additionally, it maintains a balanced precision and recall, indicating that it does not excessively sacrifice precision to achieve its higher recall. Such a balanced model holds considerable advantages, particularly when both false positives and false negatives carry significant consequences.

For the other models, mainly LSTM or BiLSTM with Dziri FastText, more fine-tuning should be performed to enhance the results. Moreover, future work may include hyperparameter tuning, class balancing techniques, or the integration of more complex models to improve performance across both classes.

The disparity between precision and recall in certain models warrants further investigation. Delving deeper into this issue could yield valuable insights into specific aspects of the dialect that might be contributing to this imbalance. Future experiments should prioritize understanding and addressing these discrepancies, with the goal of enhancing recall without compromising precision.

The results from various experimental models underscore the intricacies involved in hate speech detection in the Algerian dialect. While traditional machine learning and deep learning approaches provided some valuable insights, they fell short in capturing the dialect’s nuanced characteristics. In contrast, the DzaraShield model emerged as the most successful approach, emphasizing the pivotal role of Encoder-only models in the realm of projects of this nature.

These findings offer valuable insights for future work in this area and underscore the potential of leveraging domainspecific knowledge, advanced fine-tuning techniques, and sophisticated architectures for the effective detection of hate speech in under-studied and complex dialects such as Algerian.