This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Shreyash Mishra has an equal contribution from IIIT Sri City, India;
(2) S Suryavardan has an equal contribution from IIIT Sri City, India;
(3) Megha Chakraborty, University of South Carolina, USA;
(4) Parth Patwa, UCLA, USA;
(5) Anku Rani, University of South Carolina, USA;
(6) Aman Chadha, work does not relate to a position at Amazon from Stanford University, USA, or Amazon AI, USA;
(7) Aishwarya Reganti, CMU, USA;
(8) Amitava Das, University of South Carolina, USA;
(9) Amit Sheth, University of South Carolina, USA;
(10) Manoj Chinnakotla, Microsoft, USA;
(11) Asif Ekbal, IIT Patna, India;
(12) Srijan Kumar, Georgia Tech, USA.
Conclusion and Future Work and References
Memotion 3 is the 3rd iteration of the Memotion task. The challenge consists of three sub-tasks:
1. Task A - Sentiment analysis of memes: Given a meme, the system is supposed to classify the meme’s sentiment as positive, negative or neutral.
2. Task B - Overall emotion analysis of memes: This task’s goal is to pinpoint certain emotions connected to a given meme. Whether a meme is humorous, sarcastic, offensive, or motivating should be indicated by the system. There are multiple categories in which a meme can fit.
3. Task C - Classifying the intensity of meme emotions: The task is to determine the degree to which a particular emotion is being expressed. The ranking of these emotions is as follows:
a) Humour: Not funny, funny, very funny and hilarious
b) Sarcasm: Not Sarcastic, little sarcastic, very sarcastic and extremely sarcastic
c) Offensive: Not offensive, slightly offensive, very offensive and hateful offensive
d) Motivation: Not motivational, motivational
The tasks were conducted on the Memotion 3 dataset [5]. It consists of Hindi-English codemixed memes, which were collected from selenium based web crawler and they were gathered from various public platforms like Reddit, Google Images, etc and annotated manually. The dataset consists of 10,000 meme images divided into a train-val-test split of 7000-1500-1500. Each meme is annotated for its sentiment, emotion and intensity of emotion. Images also have their corresponding OCR text extracted with the help of Google Vision APIs and their respective URLs. For more details of the dataset, please refer to [5].
As mentioned previously, there are three tasks. Scoring is done for each task separately, and separate leaderboards are generated. For each task, we use weighted average F1 score to measure the performance of a model. The participants had access to only train and validation set. They were asked to submit a maximum of 3 submissions on the test set for each task, the best of which was selected as part of the leaderboard.
For multi-modal data, it is crucial to take into account both the visual and textual properties, particularly in the case of memes where the context can only be recorded using a mix of both elements. For textual features, we employ a multilingual form of BERT, namely HinglishBERT (BERT base-multilingual-cased) [66], which is tuned on Hinglish data. The trained Vision Transformer (ViT) model provides the visual properties [67]. The Hinglish-BERT embedding is concatenated with the pooled output from the ViT model. After passing through an MLP, the combined features are then categorised in a final classification layer. For more details about the baseline, please refer [5].