HyperTransformer: C Additional Supervised Experimentsby@escholar
116 reads

HyperTransformer: C Additional Supervised Experiments

tldt arrow

Too Long; Didn't Read

In this paper we propose a new few-shot learning approach that allows us to decouple the complexity of the task space from the complexity of individual tasks.
featured image - HyperTransformer: C Additional Supervised Experiments
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.


(1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv};

(2) Mark Sandler, Google Research & {azhmogin,sandler,mxv};

(3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}


While the advantage of decoupling parameters of the weight generator and the generated CNN model is expected to vanish with the growing CNN model size, we compared our approach to two other methods, LGM-Net (Li et al., 2019b) and LEO (Rusu et al., 2019), to verify that our approach can match their performance on sufficiently large models.

For our comparison with the LGM-Net method, we used the same image augmentation technique that was used in Li et al. (2019b) where it was applied both at the training and the evaluation stages (Ma, 2019). We also used the same CNN architecture with 4 learned 64-channel convolutional layers followed by two generated convolutional layers and the final logits layer. In our weight generator, we used 2-layer transformers with local feature extractors that relied on 48-channel convolutional layers and did not use any global features. We trained our model in the end-to-end fashion on the MINIIMAGENET 1-shot-5-way task and obtained a test accuracy of 69.3% ± 0.3% almost identical to the 69.1% accuracy reported in Li et al. (2019b).

We also carried out a comparison with LEO by using our method to generate a fully-connected layer on top of the TIEREDIMAGENET embeddings pre-computed with a WideResNet-28 model employed by Rusu et al. (2019). For our experiments, we used a simpler 1-layer transformer model with 2 heads that did not have the final fully-connected layer and nonlinearity. We also used L2 regularization of the generated fully-connected weights setting the regularization weight to 10−3. As a result of training this model, we obtained 66.2% ± 0.2% and 81.6% ± 0.2% test accuracies on the 1-shot-5-way and 5-shot-5-way TIEREDIMAGENET tasks correspondingly. These results are almost identical to 66.3% and 81.4% accuracies reported in Rusu et al. (2019).