HyperTransformer: F Visualization of The Generated CNN Weightsby@escholar

HyperTransformer: F Visualization of The Generated CNN Weights

tldt arrow

Too Long; Didn't Read

In this paper we propose a new few-shot learning approach that allows us to decouple the complexity of the task space from the complexity of individual tasks.
featured image - HyperTransformer: F Visualization of The Generated CNN Weights
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.


(1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv};

(2) Mark Sandler, Google Research & {azhmogin,sandler,mxv};

(3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}


Figures 9 and 10 show the examples of the CNN kernels that are generated by a single-head, 1- layer transformer for a simple 2-layer CNN model with 9 × 9 stride-4 kernels. Different figures correspond to different approaches to re-assembling the weights from the generated slices: using “output” allocation or “spatial” allocation (see Section 3.1 in the main text for more information). Notice that “spatial” weight allocation produces more homogeneous kernels for the first layer when compared to the “output” allocation. In both figures we show the difference of the final generated kernels for 3 variants: model with both layers generated, one generated and one trained and both trained.

Trained layers are always fixed for the inference for all the episodes, but the generated layers vary, albeit not significantly. In Figures 11 and 12 we show the generated kernels for two different episodes and, on the right, the difference between them. It appears that the generated convolutional kernel change withing 10 − 15% form episode to episode.