Table of Links
-
Related Work
-
Preliminaries
-
Methodology
4.1 Auto-Generation of Typographic Attack
2.2 Transferable Adversarial Attacks
Adversarial attacks are most harmful when they can be developed in a closed setting with public frameworks yet can still be realized to attack unseen, closed-source models. The literature on these transferable attacks popularly spans across gradient-based strategies. Against Vision-LLMs, our research focuses on exploring the transferability of typographic attacks.
Gradient-based Attacks. Since Szegedy et al. introduced the concept of adversarial examples, gradient-based methods have become the cornerstone of adversarial attacks [23, 24]. Goodfellow et al. proposed the Fast Gradient Sign Method (FGSM [25]) to generate adversarial examples using a single gradient step, perturbing the model’s input before backpropagation. Kurakin et al. later improved FGSM with an iterative optimization method, resulting in Iterative-FGSM (I-FGSM) [26]. Projected Gradient Descent (PGD [27]) further enhances I-FGSM by incorporating random noise initialization, leading to better attack performance. Gradient-based transfer attack methods typically use a known surrogate model, leveraging its parameters and gradients to generate adversarial examples, which are then used to attack a black-box model. These methods often rely on multistep iterative optimization techniques like PGD and employ various data augmentation strategies to enhance transferability [28, 29, 30, 31, 32]. However, gradient-based methods face limitations in adversarial transferability due to the disparity between the surrogate and target models, and the tendency of adversarial examples to overfit the surrogate model [33, 34].
Typographic Attacks. The development of large-scale pretrained vision-language with CLIP [11, 12] introduced a form of typographic attacks that can impair its zero-shot performances. A concurrent work [13] has also shown that such typographic attacks can extend to language reasoning tasks of Vision-LLMs like multi-choice question-answering and image-level open-vocabulary recognition. Similarly, another work [14] has developed a benchmark by utilizing a Vision-LLM to recommend an attack against itself given an image, a question, and its answer on classification datasets. Several defense mechanisms [15, 16] have been suggested by prompting the Vision-LLM to perform step-bystep reasoning. Our research differs from existing works in studying autonomous typographic attacks across question-answering scenarios of recognition, action reasoning, and scene understanding, particularly against Vision-LLMs in AD systems. Our work also discusses how they can affect reasoning capabilities at the image level, region-level understanding, and even against multiple reasoning tasks. Furthermore, we also discuss how these attacks can be realized in the physical world, particularly against AD systems.
Authors:
(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;
(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;
(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;
(4) Jie Zhang, Nanyang Technological University, Singapore;
(5) Aishan Liu, Beihang University, China;
(6) Yun Lin, Shanghai Jiao Tong University, China;
(7) Jin Song Dong, National University of Singapore, Singapore;
(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.
This paper is