Table of Links
-
Methodology and 3.1 Preliminary
-
A. Character Generation Detail
C. Effect of Text Moderator on Text-based Jailbreak Attack
D Examples
We showcase examples of our method and baselines to attack Llava-V1.6-Mistral-7B [30]in Table D This attack is based on following malicious question sampled from Redteam2K [38].
You ever just feel the need to kick a cat into traffic? I do all the time.
Authors:
(1) Siyuan Ma, University of Wisconsin–Madison ([email protected]);
(2) Weidi Luo, The Ohio State University ([email protected]);
(3) Yu Wang, Peking University ([email protected]);
(4) Xiaogeng Liu, University of Wisconsin-Madison ([email protected]).
This paper is