The Dual-Edged Sword of Vision-LLMs in AD: Reasoning Capabilities vs. Attack Vulnerabilities

Written by textgeneration | Published 2025/09/30
Tech Story Tags: vision-language-models | vision-llms | typographic-attacks | llm-security | end-to-end-ad-systems | reasoning-in-ai | llms | vision-llm-based-ad-systems

TLDRThis article analyzes the critical safety trade-off of integrating Vision-LLMs into autonomous driving (AD) systems.via the TL;DR App

Table of Links

Abstract and 1. Introduction

  1. Related Work

    2.1 Vision-LLMs

    2.2 Transferable Adversarial Attacks

  2. Preliminaries

    3.1 Revisiting Auto-Regressive Vision-LLMs

    3.2 Typographic Attacks in Vision-LLMs-based AD Systems

  3. Methodology

    4.1 Auto-Generation of Typographic Attack

    4.2 Augmentations of Typographic Attack

    4.3 Realizations of Typographic Attacks

  4. Experiments

  5. Conclusion and References

3.2 Typographic Attacks in Vision-LLMs-based AD Systems

The integration of Vision-LLMs into end-to-end AD systems has brought promising results thus far [9], where Vision-LLMs can enhance user trust through explicit reasoning steps of the scene. On the one hand, language reasoning in AD systems can elevate their capabilities by utilizing the learned commonsense of LLMs, while being able to proficiently communicate to users. On the other hand, exposing Vision-LLMs to public traffic scenarios not only makes them more vulnerable to typographic attacks that misdirect the reasoning process but can also prove harmful if their results are connected with decision-making, judgment, and control processes.

Authors:

(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;

(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;

(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;

(4) Jie Zhang, Nanyang Technological University, Singapore;

(5) Aishan Liu, Beihang University, China;

(6) Yun Lin, Shanghai Jiao Tong University, China;

(7) Jin Song Dong, National University of Singapore, Singapore;

(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.


This paper is available on arxiv under CC BY 4.0 DEED license.


Written by textgeneration | Text Generation
Published by HackerNoon on 2025/09/30