Table of Links
-
Methodology and 3.1 Preliminary
-
A. Character Generation Detail
C. Effect of Text Moderator on Text-based Jailbreak Attack
4.5 Integrating VRP with Baseline Techniques
We experimentally combine the VRP approach with established baseline techniques to evaluate their synergistic effects on jailbreak performance, as detailed in Tab. 5. The integration is simply through replacing the question typography with baseline image input and concatenate VRP and baselines’ text input. Notably, the integration of VRP significantly elevates the ASR of both FigStep and Query relevant methods. This enhancement is particularly pronounced, indicating that the addition of a role-playing element to these structure-based jailbreak methods reinforces their effectiveness. This finding underscores the potential of role-play-based enhancements in structurally jailbreak scenarios.
Authors:
(1) Siyuan Ma, University of Wisconsin–Madison ([email protected]);
(2) Weidi Luo, The Ohio State University ([email protected]);
(3) Yu Wang, Peking University ([email protected]);
(4) Xiaogeng Liu, University of Wisconsin-Madison ([email protected]).
This paper is