This story draft by @escholar has not been reviewed by an editor, YET.

A Self-explaining Neural Architecture for Generalizable Concept Learning: Appendix

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) Sanchit Sinha, University of Virginia ([email protected]);

(2) Guangzhi Xiong, University of Virginia ([email protected]);

(3) Aidong Zhang, University of Virginia ([email protected]).

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Methodology and 3.1 Representative Concept Extraction

3.2 Self-supervised Contrastive Concept Learning

3.3 Prototype-based Concept Grounding

3.4 End-to-end Composite Training

4 Experiments and 4.1 Datasets and Networks

4.2 Hyperparameter Settings

4.3 Evaluation Metrics and 4.4 Generalization Results

4.5 Concept Fidelity and 4.6 Qualitative Visualization

5 Conclusion and References

Appendix

A Appendix

Structure of Appendix


Following discussions from the main text, the Appendix section is organized as follows:


• Dataset descriptions and visual samples


• Detailed discussion around RCE and algorithmic details for CCL and PCG (Pseudocode)


• More experimental results on key hyperparameters utilized in RCE and PCG


• Concept Fidelity Analysis


• Details on Baseline Replication


• Additional visual results - selected prototypes


• Additional visual results - domain-aligned prototypes

A.1 Dataset Description

A few examples from the training set of the datasets utilized in our approach are shown in Figures 7 (Digits) for both tasks are shown in Figures 7, 8 (VisDA), 9 (DomainNet) and 10 (OfficeHome).


Figure 7: Some visual examples of the same digit classes (top: 0, bottom: 9) on the digit classification datasets - MNIST, USPS and SVHN. All samples were sampled from the train sets of each dataset.


Figure 8: Some visual examples from the VisDA dataset corresponding to three classes - airplane, car and train. The top row demonstrates the training set of computer-rendered 3D images while the bottom row includes three examples of real images from the same classes.

A.2 Training Procedure - Details

Algorithm 1 depicts the overall pseudocode to train the Representative Concept Extraction (RCE) framework with Contrastive (CCL) and Prototype-based Grounding (PCG) regularization. The finer details of each part are listed as follows:


Figure 9: Some visual examples from the DomainNet dataset corresponding to three classes - apple, binoculars and diamond. The top row demonstrates images sampled from the Real (R), Clipart (C), Painting (P) and Sketch (S) domains.


Figure 10: Some visual examples from the OfficeHome dataset corresponding to three classes - Alarm Clock, Calculator and Kettle. The rows demonstrate sample images from Real (R), Art (A), Clipart (C) and Product (P).


RCE: For networks F and H, we utilize a Resnet34 architecture and initialize them with pre-trained Imagenet1k weights. For the network A, we first utilize the element-wise vector product between the outputs of F and H and then pass the outputs through network A, which is a shallow 2-layer fully connected network. For network T, we utilize a 3-layer fully connected network which outputs a prediction with necessary and sufficient concepts. The final prediction is a weighted sum of outputs of networks A and T followed by a softmax layer. The prediction loss is the standard cross-entropy loss.


PCG: For selection of prototypical samples we utilize a combination of selection of source and target domains. We select 5 and 1 samples from the source and target domains respectively. Note that the PCG regularization starts after the 1st step. The grounding ensures that the outlying concept representations in the target domains are grounded to the abundant source domain representations. For concept bank, we utilize these 6 (5+1) prototypes for each class.


A.3 Results on Key Hyperparameters

Number of Concepts


The first 3 columns of Table 6 list the domain adaptation performance on the OfficeHome dataset across 12 different data settings (listed in rows). We evaluate the performance by varying the number of concepts C (and by extension, the relevance scores S). We choose the base setting of the number of concepts being equal to the number of classes because we want each class to be represented by at least one concept. We observe that increasing the number of concepts has no significant effect on the performance. This observation points to the fact that the relevant concept information is encoded in a few number of concepts. In other words, the concept vector is sparse.


Concept Dimensionality


The last 3 columns of Table 6 list performance by varying the concept dimensionality d (dim). Note that non-unit dimensional concepts are not directly interpretable, and remain an active area of research [Sarkar et al., 2022]. Nevertheless, we report the performance numbers by varying the concept dimensionality. We observe that the with increasing concept dimensionality, the performance on target domains increases in almost all settings. This observation is expected for the following two reasons - 1) increasing concept dimensionality increases the richness of information encoded in each concept during contrastive learning and 2) increased dimensionality increases the expressiveness of the architecture itself.


Table 6: Effect of the most important hyperparameters - number of concepts (LEFT) and the dimensionality of concepts [RIGHT] on the domain adaptation performance. The asterisk (*) shows that non-unit concept dimensionality are not directly interpretable.


Size of Representative set for PCG


Table 7 shows the performance on the OfficeHome dataset for two settings of the pre-selected representative prototypes. For all experiments in the main paper, we utilize 5 prototypes from the source domain and 1 from the target domain - for a total of 6 prototype samples for grounding. Note that it is not usually possible to use a lot of prototypes from the target domain as our setting corresponds to the 3-shot setting in [Yu and Lin, 2023]. We show the performance on 5 and 7 selected prototypes on the source domain in Table 7. We observe that increasing the number of prototypes does not result in an improvement of performance, in fact performance does not significantly change and in a few cases, performance drops. This observation implies that only a minimum number of necessary and sufficient grounding set of prototypes is required. This observation is consistent with intuition because if more than a requisite prototypes are selected, the computation time for concept representations will increase.


Distances from the Concept Representation prototypes


Table 7 lists the average normalized distance of the concept representations of the target domain from the concept representations associated with the selected prototypes with varying values of λ1 which controls the effect of supervision in PCG. We choose 3 different values for demonstrating the ablation results - λ1 = 0 for no regularization and λ1 = 1 for very high regularization. We observe that both cases lack generalization performance implying a tradeoff between regularization and generalization.

A.4 Concept Fidelity Analysis

Table 8 lists the consolidated concept fidelity scores of all four datasets. Note: This table is a complete version of the Table 5 in the main text. We see that the concept overlap on all datasets is highest in either our approach or BotCL, both approaches with explicit fidelity regularizations. This demonstrates the efficacy of our approach in maintaining concept fidelity.

A.5 Baseline Replication

We compare our approach against 4 baselines - SENN [Alvarez-Melis and Jaakkola, 2018], DiSENN [Elbaghdadi, 2020], BotCL[Wang, 2023] and UnsupervisedCBM[Sawada, 2022b]. Even though none of the approaches incorporate domain adaptation as an evaluation method, we utilize the proposed methodology directly in our settings. Proper care has been taken to ensure high overlap with the intended use and carefully introduced modifications proposed by us, listed below:


• SENN and DiSENN: We utilize the well tested publicly available code[2] as the basic framework. We modify SENN and DiSENN to include Resnet34 (for objects) and the decoder is kept the same for all setups discussed. Specifically, due to the very slow computation of the robustness loss Lh on bigger networks like Resnet34, we only compute it once every 10 steps.


BotCL: We utilize the publicly available code [3]. We utilize the same network architecture - LeNet for digits and Resnet34 for objects. Additionally, we amend the LeNet architecture to fit BotCL framework.


UnsupervisedCBM: Unsupervised CBM is hard to train as it contains a mixture of supervised and unsupervised concepts. However, our approach does not utilize supervision, so we only consider the unsupervised concepts and replace the supervised concepts with the one-hot encoding of the classes of the images. We utilize a fully connected layer for the discriminator network while simultaneously training a decoder. Though a publicly available version of UnsupCBM is not available, we are successful in the replication of its main results.


Table 7: Effect of the most important hyperparameters - size of the selected prototypes - sizes 5 and 7. The columns Dist refer to the average normalized L2 distance between the concept representations and concept prototype representations in the concept space while Perf refers to the performance. Note that there is a balance between the distance between prototypes and representations - too low distances (λ1 = 1) fail to generalize to the target domain effectively while no distance regularization (λ1 = 0) performs very close to unregularized approaches implying the need for PCG.


Table 8: Average Intra-class Concept Fidelity scores for each domain for all settings where the domain is the target. Rows S, D, B and U respectively correspond to SENN, DiSENN, BotCL and UnsupCBM. Similarly, R, P and C correspond to RCE, RCE+PCG and RCE+PCG+CCL. The columns show the domains in each dataset.

A.6 Additional Visual Results - Selected Prototypes

Figures 11a and 11b showcase the most top-5 most important prototypes concerning a given query image for the OfficeHome and DomainNet datasets respectively. For each row, the model settings are where the query and prototypes are in the target domain. We show results on all domains for both datasets, to demonstrate the efficacy of our proposed approach, which can generalize to all domains. Note that RCE is an overparameterized version of SENN, hence the performance with respect to baselines remains identical. Our proposed approach can explain each query image with relevant prototypes as opposed to the baselines where the prototypes are barely relevant.

A.7 Additional Visual Results - Domain Aligned

Figures 12a and 12b showcase the most top-5 most important prototypes concerning Digit and VisDA datasets respectively. For each row, the model settings on the left show the prototypes in the source and on the right show target domain. Our proposed approach can explain each concept with relevant prototypes across domains.


Figure 11: Selected prototypes for the (a) OfficeHome and (b) DomainNet datasets repectively.


Figure 11: Selected prototypes for the (a) OfficeHome and (b) DomainNet datasets repectively.


Figure 12: Domain aligned prototype selection for [TOP] Digits - MNIST and USPS and [BOTTOM] PACS dataset.


Figure 12: Domain aligned prototype selection for [TOP] Digits - MNIST and USPS and [BOTTOM] PACS dataset.


This paper is available on arxiv under CC BY 4.0 DEED license.


[2] https://github.com/AmanDaVinci/SENN


[3] https://github.com/wbw520/BotCL

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks