paint-brush
Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networksby@escholar

Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks

Too Long; Didn't Read

NEO-KD is a novel adversarial training strategy for multi-exit neural networks, using neighbor and exit-wise orthogonal knowledge distillation to improve robustness against attacks and reduce adversarial transferability across submodels.
featured image - Knowledge-Distillation-Based Adversarial Training for Robust Multi-Exit Neural Networks
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

Authors:

(1) Seokil Ham, KAIST;

(2) Jungwuk Park, KAIST;

(3) Dong-Jun Han, Purdue University;

(4) Jaekyun Moon, KAIST.

Abstract and 1. Introduction

2. Related Works

3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks

3.2 Algorithm Description

4. Experiments and 4.1 Experimental Setup

4.2. Main Experimental Results

4.3. Ablation Studies and Discussions

5. Conclusion, Acknowledgement and References

A. Experiment Details

B. Clean Test Accuracy and C. Adversarial Training via Average Attack

D. Hyperparameter Tuning

E. Discussions on Performance Degradation at Later Exits

F. Comparison with Recent Defense Methods for Single-Exit Networks

G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms

Abstract

While multi-exit neural networks are regarded as a promising solution for making efficient inference via early exits, combating adversarial attacks remains a challenging problem. In multi-exit networks, due to the high dependency among different submodels, an adversarial example targeting a specific exit not only degrades the performance of the target exit but also reduces the performance of all other exits concurrently. This makes multi-exit networks highly vulnerable to simple adversarial attacks. In this paper, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy that tackles this fundamental challenge based on two key contributions. NEO-KD first resorts to neighbor knowledge distillation to guide the output of the adversarial examples to tend to the ensemble outputs of neighbor exits of clean data. NEO-KD also employs exit-wise orthogonal knowledge distillation for reducing adversarial transferability across different submodels. The result is a significantly improved robustness against adversarial attacks. Experimental results on various datasets/models show that our method achieves the best adversarial accuracy with reduced computation budgets, compared to the baselines relying on existing adversarial training or knowledge distillation techniques for multi-exit networks.

1 Introduction

Multi-exit neural networks are receiving significant attention [9, 13, 26, 27, 28, 32] for their ability to make dynamic predictions in resource-constrained applications. Instead of making predictions at the final output of the full model, a faster prediction can be made at an earlier exit depending on the current time budget or computing budget. In this sense, a multi-exit network can be viewed as an architecture having multiple submodels, where each submodel consists of parameters from the input of the model to the output of a specific exit. These submodels are highly correlated as they share some model parameters. It is also well-known that the performance of all submodels can be improved by distilling the knowledge of the last exit to other exits, i.e., via self-distillation [15, 20, 24, 27]. There have also been efforts to address the adversarial attack issues in the context of multi-exit networks [3, 12].


Providing robustness against adversarial attacks is especially challenging in multi-exit networks: since different submodels have high correlations by sharing parameters, an adversarial example targeting a specific exit can significantly degrade the performance of other submodels. In other words, an adversarial example can have strong adversarial transferability across different submodels, making the model highly vulnerable to simple adversarial attacks (e.g., an adversarial attack targeting a single exit).


Motivation. Only a few prior works have focused on adversarial defense strategies for multi-exit networks [3, 12]. The authors of [12] focused on generating adversarial examples tailored to multiexit networks (e.g., generate samples via max-average attack), and trained the model to minimize the sum of clean and adversarial losses of all exits. Given the adversarial example constructed in [12], the authors of [3] proposed a regularization term to reduce the weights of the classifier at each exit during training. However, existing adversarial defense strategies [3, 12] do not directly handle the high correlations among different submodels, resulting in high adversarial transferability and limited robustness in multi-exit networks. To tackle this difficulty, we take a knowledge-distillation-based approach in a fashion orthogonal to prior works [3, 12]. Some previous studies [8, 23, 33, 34] have shown that knowledge distillation can be utilized for improving the robustness of the model in conventional single-exit networks. However, although there are extensive existing works on self-distillation for training multi-exit networks using clean data [15, 20, 24, 27], it is currently unknown how distillation techniques should be utilized for adversarial training of multi-exit networks. Moreover, when the existing distillation-based schemes are applied to multi-exit networks, the dependencies among submodels become higher since the same output (e.g., the knowledge of the last exit) is distilled to all sub-models. Motivated by these limitations, we pose the following questions: How can we take advantage of knowledge-distillation to improve adversarial robustness of multi-exit networks? At the same time, how can we reduce adversarial transferability across different submodels in multi-exit networks?


Main contributions. To handle these questions, we propose NEO-KD, a knowledge-distillation-based adversarial training strategy highly tailored to robust multi-exit neural networks. Our solution is two-pronged: neighbor knowledge distillation and exit-wise orthogonal knowledge distillation.


• Given a specific exit, the first part of our solution, neighbor knowledge distillation (NKD), distills the ensembled prediction of neighbor exits of clean data to the prediction of the adversarial example at the corresponding exit, as shown in Figure 1a. This method guides the output of adversarial examples to follow the outputs of clean data, improving robustness against adversarial attacks. By ensembling the neighbor predictions of clean data before distillation, NKD provides higher quality features to the corresponding exits compared to the scheme distilling with only one exit in the same position.


• The second focus of our solution, exit-wise orthogonal knowledge distillation (EOKD), mainly aims at reducing adversarial transferability across different submodels. This part is another unique contribution of our work compared to existing methods on robust multi-exit networks [3, 12] (that suffer from high adversarial transferability) or self-distillation-based multi-exit networks [15, 20, 24, 27] (that further increase adversarial transferability). In our EOKD, the output of clean data at the i-th exit is distilled to the output of the adversarial sample at the i-th exit, in an exit-wise manner. During this exit-wise distillation process, we encourage the non-ground-truth predictions of individual exits to be mutually orthogonal, by providing orthogonal soft labels to each exit as described in Figure 1b. By weakening the dependencies among different exit outputs, EOKD reduces the adversarial transferability across all submodels in the network, which leads to an improved robustness against adversarial attacks.


The NKD and EOKD components of our architectural solution work together to reduce adversarial transferability across different submodels in the network while correctly guiding the predictions of the adversarial examples at each exit. Experimental results on various datasets show that the proposed strategy achieves the best adversarial accuracy with reduced computation budgets, compared to existing adversarial training methods for multi-exit networks. Our solution is a plug-and-play method, which can be used in conjunction with existing training strategies tailored to multi-exit networks.


This paper is available on arxiv under CC 4.0 license.