This story draft by @escholar has not been reviewed by an editor, YET.

Methodology and Preliminary

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Related Works

  2. Methodology and 3.1 Preliminary

    3.2 Query-specific Visual Role-play

    3.3 Universal Visual Role-play

  3. Experiments and 4.1 Experimental setups

    4.2 Main Results

    4.3 Ablation Study

    4.4 Defense Analysis

    4.5 Integrating VRP with Baseline Techniques

  4. Conclusion

  5. Limitation

  6. Future work and References


A. Character Generation Detail

B. Ethics and Broader Impact

C. Effect of Text Moderator on Text-based Jailbreak Attack

D. Examples

E. Evaluation Detail

3 Methodology

In this section, we first define the jailbreak attack tasks in MLLMs in Sec. 3.1. Then, we introduce the pipeline of VRP in a query-specific setting in Sec. 3.2. In Sec. 3.3, we further extend the VRP into a universal setting and obtain a universal role-play character.

3.1 Preliminary


Adversarial Capabilities. This paper considers a black-box attack that operates without any knowledge of the MLLMs, such as their parameters and hidden states, or any manipulation such as fine-tuning. The adversary only needs the ability to query the model and receive its textual responses. The interaction is limited to a single turn with no prior dialogue history, except for any predetermined system prompts. The attacker lacks access to or control over the internal states of the generation process and cannot adjust the model’s parameters.


Authors:

(1) Siyuan Ma, University of Wisconsin–Madison ([email protected]);

(2) Weidi Luo, The Ohio State University ([email protected]);

(3) Yu Wang, Peking University ([email protected]);

(4) Xiaogeng Liu, University of Wisconsin-Madison ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks