Is GPT Powerful Enough to Analyze the Emotions of Memes?by@memeology

Is GPT Powerful Enough to Analyze the Emotions of Memes?

tldt arrow

Too Long; Didn't Read

This project investigates GPT-3.5's proficiency in sentiment analysis of internet memes and hate detection, exploring its capabilities and limitations in handling subjective tasks. Using datasets from SemEval-2020 Task 8 and Facebook hateful memes, the study provides insights into GPT's performance and its role in understanding cultural contexts.
featured image - Is GPT Powerful Enough to Analyze the Emotions of Memes?
Memeology: Leading Authority on the Study of Memes HackerNoon profile picture


(1) Jingjing Wang, School of Computing Clemson University Clemson, South Carolina, USA;

(2) Joshua Luo, The Westminster Schools Atlanta, Georgia, USA;

(3) Grace Yang, South Windsor High School South Windsor, Connecticut, USA;

(4) Allen Hong, D.W. Daniel High School Clemson, South Carolina, USA;

(5) Feng Luo, School of Computing Clemson University Clemson, South Carolina, USA.

Abstract & Introduction

Related Work


Experiment Result

Discussion and References


Large Language Models (LLMs), representing a significant achievement in artificial intelligence (AI) research, have demonstrated their ability in a multitude of tasks. This project aims to explore the capabilities of GPT-3.5, a leading example of LLMs, in processing the sentiment analysis of Internet memes. Memes, which include both verbal and visual aspects, act as a powerful yet complex tool for expressing ideas and sentiments, demanding an understanding of societal norms and cultural contexts. Notably, the detection and moderation of hateful memes pose a significant challenge due to their implicit offensive nature. This project investigates GPT’s proficiency in such subjective tasks, revealing its strengths and potential limitations. The tasks include the classification of meme sentiment, determination of humor type, and detection of implicit hate in memes. The performance evaluation, using datasets from SemEval-2020 Task 8 and Facebook hateful memes, offers a comparative understanding of GPT responses against human annotations. Despite GPT’s remarkable progress, our findings underscore the challenges faced by these models in handling subjective tasks, which are rooted in their inherent limitations including contextual understanding, interpretation of implicit meanings, and data biases. This research contributes to the broader discourse on the applicability of AI in handling complex, context-dependent tasks, and offers valuable insights for future advancements.

Index Terms—memotion analysis, hateful memes detection, GPT model


Large Language Models (LLMs), representing a significant achievement in artificial intelligence (AI) research, have recently gathered substantial interest from both academia and industry. Due to their exceptional capacity to comprehend, generate, and engage in complex linguistic tasks, these models are revolutionizing the development and application of AI algorithms. Equipped with sophisticated algorithms and vast training datasets, LLMs have demonstrated advanced conversational capabilities, processing textual input and output to participate in detailed and meaningful dialogues [6], [10], [11].

OpenAI’s ChatGPT [12] is a prominent representative among these LLMs. It is an AI chatbot backed by the generative pretrained transformer (GPT) model and has attracted a broad societal interest. The proficiency of ChatGPT is particularly noticeable in the realm of question answering, spanning multiple sectors including, but not limited to, healthcare [7] and finance [8]. This model uses its linguistic capabilities to understand questions and generate human-like answers, marking a crucial milestone in AI-driven communication. Nevertheless, despite these impressive feats, LLMs, including GPT, confront considerable challenges when addressing subjective tasks, one such task being the interpretation and annotation of internet memes.

Fig. 1. Hateful and non-hateful meme examples from Facebook hateful memes dataset.

Internet memes, modern digital artifacts, encompass a variety of cultural and sociological attitude, and serve as a type of social shorthand among online communities [1], [2]. These memes blend verbal and graphic elements to express complex concepts or feelings in a concise manner. However, the multimodal nature of memes, which combine various images with informal, often humorous or ironic text, introduces an intricate layer of complexity. To comprehend and analyze memes, one must manage the subtle interplay of visual clues, linguistic content, cultural allusions, and common community knowledge. This complexity, along with how subjective memes can be, makes it really tough for LLMs to accurately interpret memes and analyze their sentiment.

Fig. 2. Meme examples from the Multimodal Memotion Analysis dataset. The score of humorous, sarcastic, and offensive from left to right is: (3,1,0),(0,3,0),(0,3,3)

Sentiment Analysis (SA) primarily concentrate on interpreting user sentiments from textual content. However, due to the proliferation and pervasiveness of multimodal social media data, such as online memes, SA must diversify to handle multimodal data sources. This diversification becomes even more complex when considering ’hateful memes’, which cleverly combine text and images to spread offensive or harmful messages under the guise of humor [3]–[5]. Detecting and moderating such content not only requires understanding the text-image interaction but also a grasp of the cultural context and, in many cases, specific sub-cultural expertise. These challenges call for ongoing research and development in the field, pushing the boundaries and exploring the potential of LLMs like GPT.

In this project, we focus on the capabilities and limitations of LLMs like GPT in dealing with subjective and nuanced tasks, particularly internet meme sentiment analysis. This exploration forms the crux of this research, contributing to the broader discussion on the evolving potential and limitations of AI. In this project, we delve into the intricacies of subjectivity, assessing GPT’s performance in tasks necessitating an understanding of societal norms and cultural contexts. Our investigation primarily revolves around two research questions: 1. Can GPT accurately detect implicitly hateful memes given a specific prompt? 2. Can GPT effectively conduct sentiment analysis of memes, including classifying the sentiment of each meme as positive or negative, and categorizing the type of humor into three sub-classes: sarcastic, humorous, and offensive? We draw upon datasets from Facebook’s hateful memes [3] and SemEval-2020 Task 8: Multimodal Memotion Analysis [9] and for this project. The experimental section of the study will present a comparative analysis highlighting the constraints and limitations of GPT’s responses.

This paper is available on arxiv under CC 4.0 license.