Table of Links Abstract and 1. Introduction Abstract and 1. Introduction Related Work 2.1 Vision-LLMs 2.2 Transferable Adversarial Attacks Preliminaries 3.1 Revisiting Auto-Regressive Vision-LLMs 3.2 Typographic Attacks in Vision-LLMs-based AD Systems Methodology 4.1 Auto-Generation of Typographic Attack 4.2 Augmentations of Typographic Attack 4.3 Realizations of Typographic Attacks Experiments Conclusion and References Related Work 2.1 Vision-LLMs 2.2 Transferable Adversarial Attacks Related Work 2.1 Vision-LLMs 2.1 Vision-LLMs 2.2 Transferable Adversarial Attacks 2.2 Transferable Adversarial Attacks Preliminaries 3.1 Revisiting Auto-Regressive Vision-LLMs 3.2 Typographic Attacks in Vision-LLMs-based AD Systems Preliminaries 3.1 Revisiting Auto-Regressive Vision-LLMs 3.1 Revisiting Auto-Regressive Vision-LLMs 3.2 Typographic Attacks in Vision-LLMs-based AD Systems 3.2 Typographic Attacks in Vision-LLMs-based AD Systems Methodology 4.1 Auto-Generation of Typographic Attack 4.2 Augmentations of Typographic Attack 4.3 Realizations of Typographic Attacks Methodology 4.1 Auto-Generation of Typographic Attack 4.1 Auto-Generation of Typographic Attack 4.2 Augmentations of Typographic Attack 4.2 Augmentations of Typographic Attack 4.3 Realizations of Typographic Attacks 4.3 Realizations of Typographic Attacks Experiments Experiments Experiments Conclusion and References Conclusion and References Conclusion and References 3.2 Typographic Attacks in Vision-LLMs-based AD Systems The integration of Vision-LLMs into end-to-end AD systems has brought promising results thus far [9], where Vision-LLMs can enhance user trust through explicit reasoning steps of the scene. On the one hand, language reasoning in AD systems can elevate their capabilities by utilizing the learned commonsense of LLMs, while being able to proficiently communicate to users. On the other hand, exposing Vision-LLMs to public traffic scenarios not only makes them more vulnerable to typographic attacks that misdirect the reasoning process but can also prove harmful if their results are connected with decision-making, judgment, and control processes. Authors: (1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam; (2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China; (3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR; (4) Jie Zhang, Nanyang Technological University, Singapore; (5) Aishan Liu, Beihang University, China; (6) Yun Lin, Shanghai Jiao Tong University, China; (7) Jin Song Dong, National University of Singapore, Singapore; (8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore. Authors: Authors (1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam; (2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China; (3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR; (4) Jie Zhang, Nanyang Technological University, Singapore; (5) Aishan Liu, Beihang University, China; (6) Yun Lin, Shanghai Jiao Tong University, China; (7) Jin Song Dong, National University of Singapore, Singapore; (8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv available on arxiv