This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv}@google.com;
(2) Mark Sandler, Google Research & {azhmogin,sandler,mxv}@google.com;
(3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}@google.com.
Self-attention in its rudimentary form can implement a cosine-similarity-based sample weighting, which can also be viewed as a simple 1-step MAML-like learning algorithm. This can be seen by considering a simple classification model
[5] here we use only local features hφl (ei) of the sample embedding vectors ei