paint-brush
The Abstraction and Reasoning Corpus: Attention Masks for Core Geometry Priorsby@escholar

The Abstraction and Reasoning Corpus: Attention Masks for Core Geometry Priors

tldt arrow

Too Long; Didn't Read

State-of-the-art machine learning models struggle with generalization which can only be achieved by proper accounting for core knowledge priors.
featured image - The Abstraction and Reasoning Corpus: Attention Masks for Core Geometry Priors
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mattia Atzeni, EPFL, Switzerland and [email protected];

(2) Mrinmaya Sachan, ETH Zurich, Switzerland;

(3) Andreas Loukas, Prescient Design, Switzerland.

3. Attention Masks for Core Geometry Priors

This section prepares some theoretical grounding for LATFORMER, our approach to learn the transformations for lattice symmetry groups in the form of attention masks. The section defines attention masks and explains how they can be leveraged to incorporate geometry priors when solving group action learning problems on sequences and images.


3.1. Modulating Attention Weights with Soft Masking


3.2. Existence of Attention Masks Implementing Lattice Symmetry Actions


3.3. Representing Attention Masks for Lattice Transformation

To facilitate the learning of lattice symmetries, one needs to determine methods to parameterize the set of feasible group elements. Fortunately, as precised in the following theorem, the attention masks considered in Theorem 3.1 can be expressed conveniently under the same general formulation.


Figure 2: Examples of attention masks implementing transformations in two dimensions, including: (a) translation by 1 pixel on both axes, (b) rotation by 90◦ counterclockwise, (c) vertical reflection and (d) horizontal reflection around the center. White represents value 1 and black 0.



Although strictly not a symmetry operation, scaling transformations of the lattice can also be defined in terms of attention masks under the same general formulation of Theorem 3.2, as reported in Table 1. Therefore, for completeness, we will consider scaling transformations as well in our experiments.


Notice that Theorem 3.2 allows us to derive a way to calculate the attention masks. In particular, we can express our attention masks as a convolution operation on the identity, as stated below.