This story draft by @escholar has not been reviewed by an editor, YET.

Universal Visual Role-play

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Related Works

  2. Methodology and 3.1 Preliminary

    3.2 Query-specific Visual Role-play

    3.3 Universal Visual Role-play

  3. Experiments and 4.1 Experimental setups

    4.2 Main Results

    4.3 Ablation Study

    4.4 Defense Analysis

    4.5 Integrating VRP with Baseline Techniques

  4. Conclusion

  5. Limitation

  6. Future work and References


A. Character Generation Detail

B. Ethics and Broader Impact

C. Effect of Text Moderator on Text-based Jailbreak Attack

D. Examples

E. Evaluation Detail

3.3 Universal Visual Role-play

To verify the generalizability of VRP, we further extend this method to “universal” scenarios. In fact, the universal concept has been widely explored in jailbreak attacks, such as AutoDAN [34] and GCG [80], which refer to an attack strategy that employs minimal and straightforward manipulation of queries during execution. For jailbreak attacks, the universal properties are very important because they enable an attack to be applicable across a broad range of scenarios without requiring extensive modifications or customization. In the context of MLLMs, “universal” jailbreak attacks typically involve the simple aggregation of queries into a predefined format or directly printing the queries as typography onto images. Unfortunately, existing structure-based jailbreak attacks on LLMs have overlooked this problem, as they require computation for each query, especially when dealing with large datasets, making them hard to use.


To address this issue, we introduce the concept of “universal visual role-play.” The core principle of universal visual role-play is to leverage the optimization capabilities of LLMs [67] to generate candidate characters universally, followed by the selection of the best universal character. Many role-play attacks [34] can be performed in a universal setting. To obtain a universal visual role-play template, we generate multiple rounds of candidate roles, each round optimized based on previous rounds, harnessing LLMs’ optimization ability [67]. We split the entire malicious query dataset into train, validation, and test sets.



With the universal character obtained through the aforementioned process, we use it to perform universal VRP on the test set.



Authors:

(1) Siyuan Ma, University of Wisconsin–Madison ([email protected]);

(2) Weidi Luo, The Ohio State University ([email protected]);

(3) Yu Wang, Peking University ([email protected]);

(4) Xiaogeng Liu, University of Wisconsin-Madison ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks