This story draft by @escholar has not been reviewed by an editor, YET.

Large Convolutional Model Tuning via Filter Subspace: Additional Experimental Results

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Preliminary
  2. Methods
  3. Experiments
  4. Related Works
  5. Conclusion and References
  6. Details of Experiments
  7. Additional Experimental Results

8 Additional Experimental Results

8.1 Validation Experiments

We provide additional experiments with m = 6, 12 in Figure 6. As we increase m from 6 to 12, the accuracy improves from 66.86% to 68.68%.

8.2 Additional Experiments of Discriminative Tasks

Performance Comparisons on Full Dataset Fine-tuning.


Implementation details. For CIFAR-100 and ImageNet-1K, we follow the finetuning setting of ConvNeXt in [30]. We employ the AdamW [33] optimizer to fine-tune models for 100 epochs for CIFAR-100, and 30 epochs for ImageNet1K. The cosine decay strategy is adopted for the learning rate schedule, and the linear warm-up is used in the first 10 epochs for CIFAR-100 and 5 epochs for ImageNet-1K.


We compare the performance of our approach with other baseline methods, and the results on CIFAR-100 and ImageNet-1K are shown in Table 6. With full dataset fine-tuning, the full fine-tuning achieves the highest accuracy, outperforming the parameter-efficient fine-tuning methods. One possible reason is both datasets have sufficient data to prevent over-fitting of the model. Our method achieves a higher accuracy than LoRA while requiring only a small number of parameters (1.2M v.s. 21M). In contrast, in the VTAB-1k benchmark, the amount of data is not very large (e.g., only 1,000 training images), which might cause over-fitting of the model for the full fine-tuning.


Table 6: Performance comparisons on the VTAB-1k benchmark with ConvNeXT models pre-trained on ImageNet-21K.


Visualization of Generalization Error. To delve deeper into how various fine-tuning methods impact the generalization capabilities of pre-trained models, we illustrate in Figure 7 the generalization error for a discriminative task trained on the CIFAR-100 and Diabetic Retinopathy datasets, in relation to the number of fine-tuned parameters.


Fig. 7: Generalization error of (a) CIFAR-100 and (b) Diabetic Retinopathy.

8.3 Results of Few-shot Generative Tasks

We provide more experimental results of few-shot generative learning in Table. 7 and 8. In this experiment, we also include LoRA, LoHa, and LoKr with different configurations.


The generated images of different fine-tuning methods are shown in Figure 8 and 9.


Table 7: Evaluate different approaches in learning the concept <castle>.


Table 8: Evaluate different approaches in learning the concept <canal>.

8.4 Visualization of Generated Images

We visualize images generated by the models trained on each of VTAB tasks from Figure 10 to Figure 25.

8.5 Grad-CAM

To understand the underlying reason for the effectiveness of our approach on convolution-based models, we employ Grad-CAM [9] on the first block of ResNet50, which are fine-tuned on the CUB dataset [67] using the same experimental setting as above. For our method, we compare the experiment setting with m = 9, which means 9 filter atoms ∆D and the setting with (m, m1) = (9, 4), which means 36 ∆D1.


Based on the Grad-CAM visualization in Figure 26, our method exhibits larger active regions compared with LoRA. This observation indicates that our approach benefits from preserving the spatial structure of convolutional layers. When utilizing ∆D1, which expands the number of filter atoms, we observe more active regions in the Grad-CAM heatmap. This suggests that the introduction of extra filter atoms potentially captures a wider range of feature maps.


We provide more heatmap visualizations of Grad-CAM from the first block of ResNet50 in Figure 27.


Fig. 8: Images sampled from Stable Diffusion [49] checkpoints fine-tuned with different approaches. The text prompts used to generate images from top to bottom are: “The <castle> stands against a backdrop of snow-capped mountains”, “A <castle> surrounded by a lush, vibrant forest”, “A peacock in front of the <castle>”, and ‘The <castle> overlooks a serene lake, where a family of geese swims”.


Fig. 9: Images sampled from Stable Diffusion [49] checkpoints fine-tuned with different approaches. The text prompts used to generate images from top to bottom are: “The <castle> stands against a backdrop of snow-capped mountains”, “A <castle> surrounded by a lush, vibrant forest”, “A peacock in front of the <castle>”, and ‘The <castle> overlooks a serene lake, where a family of geese swims”.


Fig. 10: Images sampled from Stable Diffusion checkpoints fine-tuned on the Caltech101.


Fig. 11: Images sampled from Stable Diffusion checkpoints fine-tuned on the CIFAR100.


Fig. 12: Images sampled from Stable Diffusion checkpoints fine-tuned on the SUN397.


Fig. 13: Images sampled from Stable Diffusion checkpoints fine-tuned on the SVHN.


Fig. 14: Images sampled from Stable Diffusion checkpoints fine-tuned on the Flowers102.


Fig. 15: Images sampled from Stable Diffusion checkpoints fine-tuned on the Pets.


Fig. 16: Images sampled from Stable Diffusion checkpoints fine-tuned on the DTD.


Fig. 17: Images sampled from Stable Diffusion checkpoints fine-tuned on the EuroSAT.


Fig. 18: Images sampled from Stable Diffusion checkpoints fine-tuned on the Resisc45.


Fig. 19: Images sampled from Stable Diffusion checkpoints fine-tuned on the Patch Camelyon.


Fig. 20: Images sampled from Stable Diffusion checkpoints fine-tuned on the Diabetic Retinopathy.


Fig. 21: Images sampled from Stable Diffusion checkpoints fine-tuned on the Kitti.


Fig. 22: Images sampled from Stable Diffusion checkpoints fine-tuned on the Smallnorb.


Fig. 23: Images sampled from Stable Diffusion checkpoints fine-tuned on the Dsprites.


Fig. 24: Images sampled from Stable Diffusion checkpoints fine-tuned on the CLEVR.


Fig. 25: Images sampled from Stable Diffusion checkpoints fine-tuned on the DMLab.


Fig. 26: The Grad-CAM heatmap comparisons between our method and LoRA reveal that our approach exhibits larger active regions. The heatmap is generated from the first block of ResNet50 [13] utilizing the CUB dataset [67]. Fine-tuning the model with ∆D1 involves additional filter atoms, which leads to larger active regions in the heatmap compared to fine-tuning ∆D only. (a) The Grad-CAM from the first block of ResNet50. (b-d) The Grad-CAM from the 2-4 blocks of ResNet50.


Fig. 27: Additional Grad-CAM heatmap comparisons between our method and LoRA from the first block of ResNet50


Authors:

(1) Wei Chen, Purdue University, IN, USA ([email protected]);

(2) Zichen Miao, Purdue University, IN, USA ([email protected]);

(3) Qiang Qiu, Purdue University, IN, USA ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks