Authors:
(1) Seokil Ham, KAIST;
(2) Jungwuk Park, KAIST;
(3) Dong-Jun Han, Purdue University;
(4) Jaekyun Moon, KAIST. Table of Links Abstract and 1. Introduction 2. Related Works 3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks 3.2 Algorithm Description 4. Experiments and 4.1 Experimental Setup 4.2. Main Experimental Results 4.3. Ablation Studies and Discussions 5. Conclusion, Acknowledgement and References A. Experiment Details B. Clean Test Accuracy and C. Adversarial Training via Average Attack D. Hyperparameter Tuning E. Discussions on Performance Degradation at Later Exits F. Comparison with Recent Defense Methods for Single-Exit Networks G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms 4.3 Ablation Studies and Discussions Effect of each component of NEO-KD. In Table 6, we observe the effects of our individual components, NKD and EOKD. It shows that combining NKD and EOKD boosts up the performance beyond the sum of their original gains. Given different roles, the combination of NKD and EOKD enables multi-exit networks to achieve the state-of-the-art performance under adversarial attacks. Effect of the type of ensembles in NKD. In the proposed NKD, we consider only the neighbor exits to distill the knowledge of clean data. What if we consider fewer or more exits than neighboring exits? If the number of ensembles is too small, the scheme does not distill high-quality features. If the number of ensembles is too large, the dependencies among submodels increase, resulting in high adversarial transferability. To see this effect, in Table 7, we measure adversarial test accuracy of three types of ensembling methods depending on the number of exits used for constructing ensembles: no ensembling, ensemble neighbors (NKD), and ensemble all exits. In no enesmbling approach, we distill the knowledge of each exit from clean data to the output at the same position of exit for adversarial examples. In contrast, the ensemble all exits scheme averages the knowledge of all exits from clean data and provides it to all exits of adversarial examples. The ensemble neighbors approach corresponds to our NKD. The results show that the proposed NEO-KD with neighbor ensembling enables to distill high-quality features while lowering dependencies among submodels, confirming our intuition. Robustness against stronger adversarial attack. We evaluate NEO-KD against stronger adversarial attacks; we perform average attack based on PGD-100 [21], Carlini and Wagner (CW) [2], and AutoAttack [5]. Table 8 shows that NEO-KD achieves higher adversarial test accuracy than Adv. w/o Distill [12] in most of cases. Typically, CW attack and AutoAttack are stronger attacks than the PGD attack in single-exit networks. However, in the context of multi-exit networks, these attacks become weaker than the PGD attack when taking all exits into account. Details for generating stronger adversarial attacks are described in Appendix. Additional results. Other results including clean test accuracy, results with average attack based adversarial training, results with varying hyperparameters, and results with another baseline used in single-exit network, are provided in Appendix. This paper is available on arxiv under CC 4.0 license. Authors: (1) Seokil Ham, KAIST; (2) Jungwuk Park, KAIST; (3) Dong-Jun Han, Purdue University; (4) Jaekyun Moon, KAIST. Authors: Authors: (1) Seokil Ham, KAIST; (2) Jungwuk Park, KAIST; (3) Dong-Jun Han, Purdue University; (4) Jaekyun Moon, KAIST. Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Related Works 2. Related Works 3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks 3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks 3.2 Algorithm Description 3.2 Algorithm Description 4. Experiments and 4.1 Experimental Setup 4. Experiments and 4.1 Experimental Setup 4.2. Main Experimental Results 4.2. Main Experimental Results 4.3. Ablation Studies and Discussions 4.3. Ablation Studies and Discussions 5. Conclusion, Acknowledgement and References 5. Conclusion, Acknowledgement and References A. Experiment Details A. Experiment Details B. Clean Test Accuracy and C. Adversarial Training via Average Attack B. Clean Test Accuracy and C. Adversarial Training via Average Attack D. Hyperparameter Tuning D. Hyperparameter Tuning E. Discussions on Performance Degradation at Later Exits E. Discussions on Performance Degradation at Later Exits F. Comparison with Recent Defense Methods for Single-Exit Networks F. Comparison with Recent Defense Methods for Single-Exit Networks G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms 4.3 Ablation Studies and Discussions Effect of each component of NEO-KD. In Table 6, we observe the effects of our individual components, NKD and EOKD. It shows that combining NKD and EOKD boosts up the performance beyond the sum of their original gains. Given different roles, the combination of NKD and EOKD enables multi-exit networks to achieve the state-of-the-art performance under adversarial attacks. Effect of each component of NEO-KD. Effect of the type of ensembles in NKD. In the proposed NKD, we consider only the neighbor exits to distill the knowledge of clean data. What if we consider fewer or more exits than neighboring exits? If the number of ensembles is too small, the scheme does not distill high-quality features. If the number of ensembles is too large, the dependencies among submodels increase, resulting in high adversarial transferability. To see this effect, in Table 7, we measure adversarial test accuracy of three types of ensembling methods depending on the number of exits used for constructing ensembles: no ensembling, ensemble neighbors (NKD), and ensemble all exits. In no enesmbling approach, we distill the knowledge of each exit from clean data to the output at the same position of exit for adversarial examples. In contrast, the ensemble all exits scheme averages the knowledge of all exits from clean data and provides it to all exits of adversarial examples. The ensemble neighbors approach corresponds to our NKD. The results show that the proposed NEO-KD with neighbor ensembling Effect of the type of ensembles in NKD. enables to distill high-quality features while lowering dependencies among submodels, confirming our intuition. Robustness against stronger adversarial attack. We evaluate NEO-KD against stronger adversarial attacks; we perform average attack based on PGD-100 [21], Carlini and Wagner (CW) [2], and AutoAttack [5]. Table 8 shows that NEO-KD achieves higher adversarial test accuracy than Adv. w/o Distill [12] in most of cases. Typically, CW attack and AutoAttack are stronger attacks than the PGD attack in single-exit networks. However, in the context of multi-exit networks, these attacks become weaker than the PGD attack when taking all exits into account. Details for generating stronger adversarial attacks are described in Appendix. Robustness against stronger adversarial attack. Additional results. Other results including clean test accuracy, results with average attack based adversarial training, results with varying hyperparameters, and results with another baseline used in single-exit network, are provided in Appendix. Additional results. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

How Ensemble Strategies Impact Adversarial Robustness in Multi-Exit Networks

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks

Advancing Robustness in Multi-Exit Networks Through Exit-Wise Knowledge Distillation

Adversarial Training in Multi-Exit Networks: Proposed NEO-KD Algorithm and Problem Setup

A Robust Self-Distillation Strategy for Multi-Exit Networks

Benchmarking NEO-KD on Adversarial Robustness

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks

Advancing Robustness in Multi-Exit Networks Through Exit-Wise Knowledge Distillation

Adversarial Training in Multi-Exit Networks: Proposed NEO-KD Algorithm and Problem Setup

A Robust Self-Distillation Strategy for Multi-Exit Networks

Benchmarking NEO-KD on Adversarial Robustness

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps