HyperTransformer: A Example of a Self-Attention Mechanism For Supervised Learning

This paper is available on arxiv under CC 4.0 license. Authors: (1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv}@google.com; (2) Mark Sandler, Google Research & {azhmogin,sandler,mxv}@google.com; (3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}@google.com. Table of Links Abstract and Introduction Problem Setup and Related Work HyperTransformer Experiments Conclusion and References A Example of a Self-Attention Mechanism For Supervised Learning B Model Parameters C Additional Supervised Experiments D Dependence On Parameters and Ablation Studies E Attention Maps of Learned Transformer Models F Visualization of The Generated CNN Weights G Additional Tables and Figures A EXAMPLE OF A SELF-ATTENTION MECHANISM FOR SUPERVISED LEARNING Self-attention in its rudimentary form can implement a cosine-similarity-based sample weighting, which can also be viewed as a simple 1-step MAML-like learning algorithm. This can be seen by considering a simple classification model [5] here we use only local features hφl (ei) of the sample embedding vectors ei This paper is available on arxiv under CC 4.0 license. Authors: (1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv}@google.com; (2) Mark Sandler, Google Research & {azhmogin,sandler,mxv}@google.com; (3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}@google.com. This paper is available on arxiv under CC 4.0 license. Authors: (1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv}@google.com; (2) Mark Sandler, Google Research & {azhmogin,sandler,mxv}@google.com; (3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}@google.com. Table of Links Abstract and Introduction Problem Setup and Related Work HyperTransformer Experiments Conclusion and References A Example of a Self-Attention Mechanism For Supervised Learning B Model Parameters C Additional Supervised Experiments D Dependence On Parameters and Ablation Studies E Attention Maps of Learned Transformer Models F Visualization of The Generated CNN Weights G Additional Tables and Figures Abstract and Introduction Abstract and Introduction Problem Setup and Related Work Problem Setup and Related Work HyperTransformer HyperTransformer Experiments Experiments Conclusion and References Conclusion and References A Example of a Self-Attention Mechanism For Supervised Learning A Example of a Self-Attention Mechanism For Supervised Learning B Model Parameters B Model Parameters C Additional Supervised Experiments C Additional Supervised Experiments D Dependence On Parameters and Ablation Studies D Dependence On Parameters and Ablation Studies E Attention Maps of Learned Transformer Models E Attention Maps of Learned Transformer Models F Visualization of The Generated CNN Weights F Visualization of The Generated CNN Weights G Additional Tables and Figures G Additional Tables and Figures A EXAMPLE OF A SELF-ATTENTION MECHANISM FOR SUPERVISED LEARNING Self-attention in its rudimentary form can implement a cosine-similarity-based sample weighting, which can also be viewed as a simple 1-step MAML-like learning algorithm. This can be seen by considering a simple classification model [5] here we use only local features hφl (ei) of the sample embedding vectors ei