Table of Links
- Preliminary
- Methods
- Experiments
- Related Works
- Conclusion and References
- Details of Experiments
- Additional Experimental Results
8 Additional Experimental Results
8.1 Validation Experiments
We provide additional experiments with m = 6, 12 in Figure 6. As we increase m from 6 to 12, the accuracy improves from 66.86% to 68.68%.
8.2 Additional Experiments of Discriminative Tasks
Performance Comparisons on Full Dataset Fine-tuning.
Implementation details. For CIFAR-100 and ImageNet-1K, we follow the finetuning setting of ConvNeXt in [30]. We employ the AdamW [33] optimizer to fine-tune models for 100 epochs for CIFAR-100, and 30 epochs for ImageNet1K. The cosine decay strategy is adopted for the learning rate schedule, and the linear warm-up is used in the first 10 epochs for CIFAR-100 and 5 epochs for ImageNet-1K.
We compare the performance of our approach with other baseline methods, and the results on CIFAR-100 and ImageNet-1K are shown in Table 6. With full dataset fine-tuning, the full fine-tuning achieves the highest accuracy, outperforming the parameter-efficient fine-tuning methods. One possible reason is both datasets have sufficient data to prevent over-fitting of the model. Our method achieves a higher accuracy than LoRA while requiring only a small number of parameters (1.2M v.s. 21M). In contrast, in the VTAB-1k benchmark, the amount of data is not very large (e.g., only 1,000 training images), which might cause over-fitting of the model for the full fine-tuning.
Visualization of Generalization Error. To delve deeper into how various fine-tuning methods impact the generalization capabilities of pre-trained models, we illustrate in Figure 7 the generalization error for a discriminative task trained on the CIFAR-100 and Diabetic Retinopathy datasets, in relation to the number of fine-tuned parameters.
8.3 Results of Few-shot Generative Tasks
We provide more experimental results of few-shot generative learning in Table. 7 and 8. In this experiment, we also include LoRA, LoHa, and LoKr with different configurations.
The generated images of different fine-tuning methods are shown in Figure 8 and 9.
8.4 Visualization of Generated Images
We visualize images generated by the models trained on each of VTAB tasks from Figure 10 to Figure 25.
8.5 Grad-CAM
To understand the underlying reason for the effectiveness of our approach on convolution-based models, we employ Grad-CAM [9] on the first block of ResNet50, which are fine-tuned on the CUB dataset [67] using the same experimental setting as above. For our method, we compare the experiment setting with m = 9, which means 9 filter atoms ∆D and the setting with (m, m1) = (9, 4), which means 36 ∆D1.
Based on the Grad-CAM visualization in Figure 26, our method exhibits larger active regions compared with LoRA. This observation indicates that our approach benefits from preserving the spatial structure of convolutional layers. When utilizing ∆D1, which expands the number of filter atoms, we observe more active regions in the Grad-CAM heatmap. This suggests that the introduction of extra filter atoms potentially captures a wider range of feature maps.
We provide more heatmap visualizations of Grad-CAM from the first block of ResNet50 in Figure 27.
Authors:
(1) Wei Chen, Purdue University, IN, USA ([email protected]);
(2) Zichen Miao, Purdue University, IN, USA ([email protected]);
(3) Qiang Qiu, Purdue University, IN, USA ([email protected]).
This paper is