This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Ali Ghanbari, Dept. of Computer Science, Iowa State University;
(2) Deepak-George Thomas, Dept. of Computer Science, Iowa State University;
(3) Muhammad Arbab Arshad, Dept. of Computer Science, Iowa State University;
(4) Hridesh Rajan, Dept. of Computer Science, Iowa State University.
Neuralint [12] uses graph transformations [65] to abstract away unnecessary details in the model and check the bug patterns directly on the graph. While Neuralint is orders of magnitude faster than deepmufl, it proved to be less effective than deepmufl in our dataset.
DeepLocalize [11] and DeepDiagnosis [8] intercept the training process looking for known bug patterns such as numerical errors. DeepDiagnosis pushes the envelope by implementing a decision tree that gives actionable fix suggestions based on the detected symptoms. A closely related technique, UMLAUT [34], works by applying heuristic static checks on, and injecting dynamic checks in, various parts of the DNN program. deepmufl outperforms DeepLocalize, DeepDiagnosis, and UMLAUT in terms of the number of bugs detected.
DeepFD [66] is a recent learning-based fault localization technique which frames the fault localization as a learning problem. MODE [25] and DeepFault [26] implement whitebox DNN testing technique which utilizes suspiciousness values obtained via an implementation of spectrum-based fault localization to increase the hit spectrum of neurons and identify suspicious neurons whose weights have not been calibrated correctly and thus are considered responsible for inadequate DNN performance. MODE was not publicly available, but DeepFault was, but unfortunately it was hard-coded to the examples shipped with its replication package, so we could not make the tool work without making substantial modifications to it, not to mention that these techniques work best on ReLUbased networks and applying them on most of the bugs in our dataset would not make much sense.
Other related works are as follows. PAFL [67] operates on RNN models by converting such models into probabilistic finite automata (PFAs) and localize faulty sequences of state transitions on PFAs. Sun et al. [68] propose DeepCover, which uses a variant of spectrum-based fault localization for DNN explainability.