I’ve had the privilege of presenting my work on offensive text classification in code switched Hindi-English language at both ACL’18 and EMNLP’18, two top-tier NLP conferences. Based on the great feedback I’ve received, the outreach and awareness I’ve been able to raise and the number of people I’ve been able to involve in tackling mental health problems on social media, I’ve decided to dive in deeper into the motivation and methodology for this work.
Our team has had the great fortune of receiving grants for both our papers and we’re elated to have been given the platform to not only learn but also bring back everything we’ve learned at these forums to our community. The motivation to write this post is the same as why I write my other blog posts or give talks; I want to give back to the community, I want to introduce students to making the web a more accepting, open and secure place by leveraging AI and the aspects of AI for social good.
The papers can be found in ACL Anthology:
The rampant use of offensive content on social media is destructive to a progressive society as it tends to promote abuse, violence and chaos and severely impacts individuals at different levels. Specifically, in the Indian subcontinent, number of Internet users is rising rapidly due to inexpensive data . With this rise, comes the problem of hate speech, offensive and abusive posts on social media. Social media is rife with such offensive content that can be broadly classified as abusive and hate-inducing on the basis of severity and target of the discrimination
Hate speech vs Abusive speech: Is there a difference?
Hate speech is an act of offending a person or a group as a whole on the basis of certain key attributes such as religion, race, sexual orientation, gender, ideological background, mental and physical disability.
Abusive speech is offensive speech with a vague target and mild intention to hurt the sentiments of the receiver.
What is Hinglish?
Hinglish is a major contributor to the tremendously high offensive online which is formed of the words spoken in Hindi language but written in Roman script instead of the Devanagari script. Hinglish is a pronunciation based bi-lingual language that has no fixed grammar rules. Hinglish extends its grammatical setup from native Hindi accompanied by a plethora of slurs, slang and phonetic variations due to regional influence.
Is Hinglish that commonly used to warrant studies for offensive text classification?
Most social media platforms delete such offensive content when: (i) either someone reports manually or (ii) an offensive content classifier automatically detects them. However, people often use such code-switched languages to write offensive content on social media so that English trained classifiers can not detect them automatically, necessitating an efficient classifier that can detect offensive content automatically from code-switched languages. In 2015, India ranked fourth on the Social Hostilities Index with an index value of 8.7 out of 10, making it imperative to filter the tremendously high offensive online content in Hinglish.
Hinglish has the following characteristics:
1. It is formed of words spoken in Hindi (Indic) language but written in Roman script instead of the standard Devanagari script.
2. It is one of the many pronunciations based pseudo languages created natively by social media users for the ease of communication.
3. It has no fixed grammar rules but rather borrows the grammatical setup from native Hindi and compliments it with Roman script along with a plethora of slurs, slang and phonetic variations due to regional influence.
Hence, such a code-switched language presents challenging limitations in terms of the randomised spelling variations in explicit words due to a foreign script and compounded ambiguity arising due to the various interpretations of words in different contextual situations.
Another challenge worth consideration in dealing with Hinglish is the demographic divide between the users of Hinglish relative to total active users globally. This poses a serious limitation as the tweet data in Hinglish language is a small fraction of the large pool of tweets generated, necessitating the use of selective methods to process such tweets in an automated fashion.
In the following article, I will cover the technical aspects of each of these subproblems in depth.
HOT Dataset
HOT is a manually annotated dataset that was created using the Twitter Streaming API3 by selecting tweets having more than 3 Hinglish words. The tweets were collected during the interval of 4 months of November 2017 to February 2018. The tweets were mined by imposing geo-location restriction such that tweets originating only in the Indian subcontinent were made part of the corpus. The collected corpus of tweets initially had 25667 tweets which was filtered down to remove tweets containing only URL’s, only images and videos, having less than 3 words, non-English and nonHinglish scripts and duplicates.
T-SNE plot of the HOT dataset
HOT Annotation
The annotation of HOT tweets were done by three annotators having sufficient background in NLP research. The tweets were labeled as hate speech if they satisfied one or more of the conditions: (i) tweet used sexist or racial slur to target a minority, (ii) undignified stereotyping or (iii) supporting a problematic hashtags such as #ReligiousSc*m.
Examples of tweets in the HOT dataset
Preprocessing
Preprocessing is often overlooked, but is one of the most crucial steps for NLP problems. The tweets obtained from data sources were channeled through the following pre-processing pipeline with the aim to transform them into semantic feature vectors.
MIMCT Model
The MIMCT model has a split architecture consisting of two major components:
MIMCT Model
Does this model have to be this complex?
You’re probably thinking there must be simpler models that can solve this problem. There’s a lot happening here, and while the paper describes this more formally, I’ll attempt to describe this in the easiest way possible.
Apart from the regular embedding inputs, additional hierarchical contextual features are also required so as to complement the overall classification of the textual data. These features additionally focus on the sentiment and tailor-made abuses that may not be present in regular dictionary corpus. This helps to overcome a serious bottleneck in the classification task and could be one of the prominent reasons for high misclassification of abusive and hate-inducing class in baseline and basic transfer learning approaches. The multiple modalities added to the MIMCT model as secondary inputs are:
Transfer Learning via CNNs and LSTMs
The proposal to apply transfer learning is inspired by the fact that despite having a smallsized dataset, it provides relative performance increase at a reduced storage and computational cost (Bengio, 2012). Deep learning models pre-trained on EOT learn the low-level features of the English language tweets. The weights of initial convolutional layers are frozen while the last few layers are kept trainable such that when the model is retrained on the HOT dataset, it learns to extract high level features corresponding to syntax variations in translated Hinglish language.
Diving deeper into the architecture
CNN: Convolutional 1D layer (filter size=15, kernel size=3) → Convolutional 1D (filter size=12, kernel size=3) → Convolutional 1D (filter size=10, kernel size=3) → Dropout (0.2) → Flatten Layer → Dense Layer (64 units, activation = ’relu’) → Dense Layer (3 units, activation = ’softmax’)
LSTM : LSTM layer(h=64, dropout=0.25, recurrent dropout=0.3) → Dense (64 units, activation = ’relu’) → Dense (3 units, activation = ’sigmoid’)
Results for non-offensive, abusive, hate-inducing tweet classification on EOT, HOT and the HOT dataset with transfer learning (TFL) for Glove, Twitter Word2vec and FastText embeddings
Key takeaways:
Results of the MIMCT model with various input features HOT compared to the previous baseline. Primary inputs are enclosed within parentheses, and secondary inputs are enclosed within square brackets.
The main contributions of our work can be summarised as follows:
The future work entails: