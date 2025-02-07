Authors: (1) Anonymous authors Paper under double-blind review Jarrod Haas, SARlab, Department of Engineering Science Simon Fraser University; Digitalist Group Canada and [email protected]; (2) William Yolland, MetaOptima and [email protected]; (3) Bernhard Rabus, SARlab, Department of Engineering Science, Simon Fraser University and [email protected].





Abstract and 1 Introduction

2 Background 2.1 Problem Definition 2.2 Related Work 2.3 Deep Deterministic Uncertainty 2.4 L2 Normalization of Feature Space and Neural Collapse

3 Methodology 3.1 Models and Loss Functions 3.2 Measuring Neural Collapse

4 Experiments 4.1 Faster and More Robust OoD Results 4.2 Linking Neural Collapse with OoD Detection

5 Conclusion and Future Work, and References A Appendix A.1 Training Details A.2 Effect of L2 Normalization on Softmax Scores for OoD Detection A.3 Fitting GMMs on Logit Space A.4 Overtraining with L2 Normalization A.5 Neural Collapse Measurements for NC Loss Intervention A.6 Additional Figures



5 Conclusion and Future Work

We propose a simple, one-line-of-code modification of the Deep Deterministic Uncertainty benchmark that provides superior OoD detection and classification accuracy results in a fraction of the training time. We also establish that L2 normalization induces NC faster than regular training, and that NC is linked to OoD detection performance under the DDU method. Although we do not suggest that NC is the sole explanation for OoD performance, we do expect that its simple structure can provide insight into the complex and poorly understood behaviour of uncertainty in deep neural networks. We believe that this connection is a compelling area of future research into uncertainty and robustness in DNNs.

References

A Appendix

A.1 Training Details

All models (except those explicitly noted in the ablation study) use spectral normalization, leaky ReLUs and Global Average Pooling (GAP), as these produce the strongest baselines. Each experiment was conducted with fifteen randomly initialized model parameter sets; no fixed seeds were used at any time for initialization. We set the batch size to 1024 for all training runs, except the NC intervention models, which were more stable when training with a batch size of 2048. All training was conducted on four NVIDIA V100 GPUs in PyTorch 1.10.1 Paszke et al. (2019).





Stochastic gradient descent (SGD) with an initial learning rate of 1e −1 was used as the optimizer for all experiments. We used a learning rate schedule that decreased by one order of magnitude at 150 and 250 epochs for the 350 epoch models, as per the DDU benchmark. We adjust the learning rate at 75 and 90 for the 100 epoch ResNet50 models, and at 40 and 50 for the 60 epoch ResNet18 models. Models were trained on the standard CIFAR-10 training data set with a validation size of 10% created with a fixed random seed.

A.2 Effect of L2 Normalization on Softmax Scores for OoD Detection









Table 4: OoD detection results using (a) log probabilities from a GMM fitted over feature space and (b) softmax scores. ResNet18 and ResNet50 models were used, 15 seeds per experiment, trained on CIFAR10, with SVHN, CIFAR100 and Tiny ImageNet test sets used as OoD data. For all models, we indicate whether L2 normalization over feature space was used (L2/No L2) and how many training epochs occurred (60/100/350), and compare against baseline (No L2 350). There is no clear pattern of behaviour when using softmax scores for OoD detection, but using GMMs provides superior results.

A.3 Fitting GMMs on Logit Space

















Table 4 shows the results of experiments with GMMs fit over logit space. This approach performs worse than GMMs fit over feature space in all cases. Intuitively, this makes sense: even under perfect NC, we would expect OoD inputs to increase the variability of class clusters in arbitrary dimensions of feature space. A Singular Value Decomposition (SVD) over feature space supports our intuitions. In Figure 6, we show the SVD of all training embeddings for CIFAR10, along with the singular values for the test set and SVHN OoD test set projected onto the the same basis used for the training singular values. As we would expect, the first 10 singular values contain nearly all information. However, the latter 502 singular values contain significantly more information in the OoD case. This information is critical to identifying OoD examples in feature space and, due to dimensionality reduction, is severely reduced in logit space.









A.4 Overtraining with L2 Normalization

Table 6 shows the results of overtraining with L2 Normalization (L2 350). While there is not a substantial penalty for overtraining by 10 to 100 epochs (Figure 5, Right), training for the full 350 epochs (as with the DDU baseline) starts to reduce OoD performance by a few percentage points. We note that there is a tradeoff with accuracy, which does increase when overtraining to 350 epochs.









Table 6: OoD detection (a) and classification accuracy results (b) for ResNet18 and ResNet50 models, 15 seeds per experiment, trained on CIFAR10, with SVHN, CIFAR100 and Tiny ImageNet test sets used as OoD data. For all models, we indicate whether L2 normalization over feature space was used (L2/No L2) and how many training epochs occurred (60/100/350), and compare against baseline (No L2 350).

A.5 Neural Collapse Measurements for NC Loss Intervention

A.6 Additional Figures





This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.



