Authors:
(1) Rui Duan University of South Florida Tampa, USA (email: ruiduan@usf.edu);
(2) Zhe Qu Central South University Changsha, China (email: zhe_qu@csu.edu.cn);
(3) Leah Ding American University Washington, DC, USA (email: ding@american.edu);
(4) Yao Liu University of South Florida Tampa, USA (email: yliu@cse.usf.edu);
(5) Yao Liu University of South Florida Tampa, USA (email: yliu@cse.usf.edu). Table of Links Abstract and Intro Background and Motivation Parrot Training: Feasibility and Evaluation PT-AE Generation: A Joint Transferability and Perception Perspective Optimized Black-Box PT-AE Attacks Experimental Evaluations Related Work Conclusion and References Appendix VII. RELATED WORK White-box attacks: Adversarial audio attacks [28], [114], [72], [101], [105], [32], [43], [118], [43], [29], [118] can be categorized into white-box and black-box attacks depending on their attack knowledge level. White-box attacks [28], [95] assumed the knowledge of the target model and leveraged the gradient information of the target model to generate highly effective AEs. Some recent studies aimed at improving the practicality of white-box attacks [72], [52] via adding the perturbation to the original speech signal without synchronization, albeit still assuming nearly full knowledge of the target model. Query-based black-box attacks: Existing black-box attacks [29], [118], [101], [105], [74], [113] assumed no access to the internal knowledge of target models, and most black-box attacks attempted to know the target model via a querying (or probing) strategy. The query-based attacks [29], [43], [118], [113], [74] needed to interact with the target model to get the internal prediction scores [29], [105], [32], [113] or hard label results [118], [74]. A large number of queries were necessary for the black-box attack to be effective. For example, Occam [118] needed over 10,000 queries to achieve a high ASR. This makes the attack strategy cumbersome to launch, especially in over-the-air scenarios. The PT-AE attack does not require any probing to the target model. Transfer-based black-box attacks: The transfer-based attacks [17], [44], [30] commonly assumed no interaction or limited probing [32] to the target model. For example, Kenansville [17] manipulated the phoneme of the speech to achieve an untargeted attack. QFA2SR [30] focused on building the surrogate models with specific ensemble strategies to enhance the transferability of AEs by assuming knowing several speech samples of all the enrolled speakers of the target model. Compared with QFA2SR, we further minimize the knowledge and only assume a short speech sample of the target speaker for the attacker. Even with the most limited attack knowledge, we propose a new PT-AE strategy that creates more effective AEs against the target model. Audio attacks considering the perception quality: Some recent studies [95], [52], [74] leveraged the psychoacoustic feature to optimize the carriers and improve the perception of AEs. Meanwhile, [44], [113] manipulated the features of an audio signal to create AEs with good perceptual quality. In addition, there are audio attack strategies [116], [26], [16], [114] focusing on improving the stealthiness of the AEs. For example, dolphin attack [116] used ultrasounds to generate imperceptible AEs. The human study in this work defines the metric of SRS to quantify the speech quality using a similar regression procedure motivated by the qDev model in [44] that was created to measure the music quality. We then design a new TPR framework built upon the SRS metric to jointly evaluate both the transferability and perception of PT-AEs. This paper is available on arxiv under CC0 1.0 DEED license. Authors: (1) Rui Duan University of South Florida Tampa, USA (email: ruiduan@usf.edu); (2) Zhe Qu Central South University Changsha, China (email: zhe_qu@csu.edu.cn); (3) Leah Ding American University Washington, DC, USA (email: ding@american.edu); (4) Yao Liu University of South Florida Tampa, USA (email: yliu@cse.usf.edu); (5) Yao Liu University of South Florida Tampa, USA (email: yliu@cse.usf.edu). Authors: Authors: (1) Rui Duan University of South Florida Tampa, USA (email: ruiduan@usf.edu); (2) Zhe Qu Central South University Changsha, China (email: zhe_qu@csu.edu.cn); (3) Leah Ding American University Washington, DC, USA (email: ding@american.edu); (4) Yao Liu University of South Florida Tampa, USA (email: yliu@cse.usf.edu); (5) Yao Liu University of South Florida Tampa, USA (email: yliu@cse.usf.edu). Table of Links Abstract and Intro Abstract and Intro Background and Motivation Background and Motivation Parrot Training: Feasibility and Evaluation Parrot Training: Feasibility and Evaluation PT-AE Generation: A Joint Transferability and Perception Perspective PT-AE Generation: A Joint Transferability and Perception Perspective Optimized Black-Box PT-AE Attacks Optimized Black-Box PT-AE Attacks Experimental Evaluations Experimental Evaluations Related Work Related Work Conclusion and References Conclusion and References Appendix Appendix VII. RELATED WORK White-box attacks: Adversarial audio attacks [28], [114], [72], [101], [105], [32], [43], [118], [43], [29], [118] can be categorized into white-box and black-box attacks depending on their attack knowledge level. White-box attacks [28], [95] assumed the knowledge of the target model and leveraged the gradient information of the target model to generate highly effective AEs. Some recent studies aimed at improving the practicality of white-box attacks [72], [52] via adding the perturbation to the original speech signal without synchronization, albeit still assuming nearly full knowledge of the target model. White-box attacks: Query-based black-box attacks: Existing black-box attacks [29], [118], [101], [105], [74], [113] assumed no access to the internal knowledge of target models, and most black-box attacks attempted to know the target model via a querying (or probing) strategy. The query-based attacks [29], [43], [118], [113], [74] needed to interact with the target model to get the internal prediction scores [29], [105], [32], [113] or hard label results [118], [74]. A large number of queries were necessary for the black-box attack to be effective. For example, Occam [118] needed over 10,000 queries to achieve a high ASR. This makes the attack strategy cumbersome to launch, especially in over-the-air scenarios. The PT-AE attack does not require any probing to the target model. Query-based black-box attacks: Transfer-based black-box attacks: The transfer-based attacks [17], [44], [30] commonly assumed no interaction or limited probing [32] to the target model. For example, Kenansville [17] manipulated the phoneme of the speech to achieve an untargeted attack. QFA2SR [30] focused on building the surrogate models with specific ensemble strategies to enhance the transferability of AEs by assuming knowing several speech samples of all the enrolled speakers of the target model. Compared with QFA2SR, we further minimize the knowledge and only assume a short speech sample of the target speaker for the attacker. Even with the most limited attack knowledge, we propose a new PT-AE strategy that creates more effective AEs against the target model. Transfer-based black-box attacks: Audio attacks considering the perception quality: Some recent studies [95], [52], [74] leveraged the psychoacoustic feature to optimize the carriers and improve the perception of AEs. Meanwhile, [44], [113] manipulated the features of an audio signal to create AEs with good perceptual quality. In addition, there are audio attack strategies [116], [26], [16], [114] focusing on improving the stealthiness of the AEs. For example, dolphin attack [116] used ultrasounds to generate imperceptible AEs. The human study in this work defines the metric of SRS to quantify the speech quality using a similar regression procedure motivated by the qDev model in [44] that was created to measure the music quality. We then design a new TPR framework built upon the SRS metric to jointly evaluate both the transferability and perception of PT-AEs. Audio attacks considering the perception quality: This paper is available on arxiv under CC0 1.0 DEED license. This paper is available on arxiv under CC0 1.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

The Evolution of Black-Box Audio Attacks

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

22 Examples of Incompetent AI Agents

Adversarial Machine Learning: A Beginner’s Guide to Adversarial Attacks and Defenses

Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models

Understanding Speaker Recognition and Adversarial Speech Attacks

Evaluating Feasibility and Accuracy of Parrot Training Models

Assessing Transferability and Perception in PT-AE Audio Attacks

22 Examples of Incompetent AI Agents

Adversarial Machine Learning: A Beginner’s Guide to Adversarial Attacks and Defenses

Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models

Understanding Speaker Recognition and Adversarial Speech Attacks

Evaluating Feasibility and Accuracy of Parrot Training Models

Assessing Transferability and Perception in PT-AE Audio Attacks

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps