This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jingjing Wang, School of Computing;
(2) Joshua Luo, The Westminster Schools;
(3) Grace Yang, South Windsor High School;
(4) Allen Hong, D.W. Daniel High School;
(5) Feng Luo, School of Computing.
[1] C. Bauckhage, “Insights into Internet Memes”, ICWSM, vol. 5, no. 1, pp. 42-49, Aug. 2021.
[2] L. Shifman, ”Memes in a digital world: Reconciling with a conceptual troublemaker”, Journal of computer-mediated communication, vol. 18, no. 3, pp. 362–377, 2013, Oxford University Press Oxford, UK.
[3] D. Kiela, H. Firooz, A. Mohan, V. Goswami, A. Singh, P. Ringshia, and D. Testuggine, “The hateful memes challenge: Detecting hate speech in multimodal memes,” Advances in neural information processing systems, vol. 33, pp. 2611–2624, 2020.
[4] S. Suryawanshi, B. R. Chakravarthi, M. Arcan, and P. Buitelaar, “Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text,” in Proceedings of the second workshop on trolling, aggression and cyberbullying, 2020, pp. 32–41.
[5] A. Williams, C. Oliver, K. Aumer, and C. Meyers, “Racial microaggressions and perceptions of Internet memes,” Computers in Human Behavior, vol. 63, pp. 424–432, 2016, Elsevier.
[6] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
[7] K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stuber, J. ¨ Topalis, T. Weber, P. Wesp, B. Sabel, J. Ricke, et al., “Chatgpt makes medicine easy to swallow: An exploratory case study on simplified radiology reports,” arXiv preprint arXiv:2212.14882, 2022.
[8] B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, Y. Wu, “How close is chatgpt to human experts? comparison corpus, evaluation, and detection,” arXiv preprint arXiv:2301.07597, 2023.
[9] C. Sharma, D. Bhageria, W. Scott, S. Pykl, A. Das, T. Chakraborty, V. Pulabaigari, B. Gamback, “SemEval-2020 Task 8: Memotion Analysis– The Visuo-Lingual Metaphor!,” arXiv preprint arXiv:2008.03781, 2020.
[10] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
[11] P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, et al., “Holistic evaluation of language models,” arXiv preprint arXiv:2211.09110, 2022.
[12] OpenAI, “Introducing chatgpt,” 2023.
[13] K. Zhou, J. Yang, C. C. Loy, Z. Liu, “Conditional prompt learning for vision-language models,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825, 2022.
[14] Z. Yang, L. Li, J. Wang, K. Lin, E. Azarnasab, F. Ahmed, Z. Liu, C. Liu, M. Zeng, L. Wang, “Mm-react: Prompting chatgpt for multimodal reasoning and action,” arXiv preprint arXiv:2303.11381, 2023.
[15] P. Lu, S. Mishra, T. Xia, L. Qiu, K. W. Chang, S. C. Zhu, O. Tafjord, P. Clark, A. Kalyan, “Learn to explain: Multimodal reasoning via thought chains for science question answering,” Advances in Neural Information Processing Systems, vol. 35, pp. 2507–2521, 2022.
[16] W. Huang, P. Abbeel, D. Pathak, I. Mordatch, “Language models as zeroshot planners: Extracting actionable knowledge for embodied agents,” International Conference on Machine Learning, pp. 9118–9147, 2022.
[17] B. Paranjape, S. Lundberg, S. Singh, H. Hajishirzi, L. Zettlemoyer, M. T. Ribeiro, “ART: Automatic multi-step reasoning and tool-use for large language models,” arXiv preprint arXiv:2303.09014, 2023.
[18] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[19] S. Pramanick, S. Sharma, D. Dimitrov, M. S. Akhtar, P. Nakov, T. Chakraborty, “MOMENTA: A multimodal framework for detecting harmful memes and their targets,” arXiv preprint arXiv:2109.05184, 2021.
[20] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.
[21] S. Huang, L. Dong, W. Wang, F. Wei, M. Lapata, B. Chang, X. Yan, “Language Models as Meta-Prompts: Multimodal Meta-Learning through Prompt Engineering,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
[22] S. Uppal, S. Bhagat, D. Hazarika, N. Majumder, S. Poria, R. Zimmermann, A. Zadeh, “Multimodal research in vision and language: A review of current and emerging trends,” Information Fusion, vol. 77, pp. 149– 171, 2022, Elsevier.
[23] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” International conference on machine learning, pp. 8748–8763, 2021, PMLR.
[24] Y. Hu, H. Hua, Z. Yang, W. Shi, N. A. Smith, J. Luo, “Promptcap: Prompt-guided task-aware image captioning,” arXiv preprint arXiv:2211.09699, 2022.
[25] K.-L. Chiu, A. Collins, R. Alexander, “Detecting hate speech with gpt3,” arXiv preprint arXiv:2103.12407, 2021.
[26] S. Sharma, F. Alam, M. S. Akhtar, D. Dimitrov, G. D. S. Martino, H. Firooz, A. Halevy, F. Silvestri, P. Nakov, T. Chakraborty, “Detecting and understanding harmful memes: A survey,” arXiv preprint arXiv:2205.04274, 2022.
[27] K. Tanaka, H. Yamane, Y. Mori, Y. Mukuta, T. Harada, “Learning to Evaluate Humor in Memes Based on the Incongruity Theory,” Proceedings of the Second Workshop on When Creative AI Meets Conversational AI, pp. 81–93, 2022.
[28] K. Scott, “Memes as multimodal metaphors: A relevance theory analysis,” Pragmatics & Cognition, vol. 28, no. 2, pp. 277–298, 2021, John Benjamins Publishing Company Amsterdam/Philadelphia.
[29] S. Sharma, A. Kulkarni, T. Suresh, H. Mathur, P. Nakov, M. S. Akhtar, T. Chakraborty, “Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?,” arXiv preprint arXiv:2301.11219, 2023.
[30] A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani, et al., “Socratic models: Composing zero-shot multimodal reasoning with language,” arXiv preprint arXiv:2204.00598, 2022.
[31] S. Pramanick, S. Sharma, D. Dimitrov, M. S. Akhtar, P. Nakov, T. Chakraborty, “MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets,” Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4439–4455, Nov. 2021, Punta Cana, Dominican Republic, Association for Computational Linguistics.