This paper is available on arxiv under CC 4.0 license.
(1) Mattia Atzeni, EPFL, Switzerland and mattia.atzeni@outlook.it;
(2) Mrinmaya Sachan, ETH Zurich, Switzerland;
(3) Andreas Loukas, Prescient Design, Switzerland.
Motivated by the long-term ambitious goal of infusing core knowledge priors in neural networks, this paper focused on how to help deep learning models to learn geometric transformations efficiently. Specifically, we proposed to incorporate lattice symmetry biases into attention mechanisms by modulating the attention weights using learned soft masks. We have shown that attention masks implementing the actions of the symmetry group of a hypercubic lattice exist, and we provided a way to represent these masks. This motivated us to introduce LATFORMER, a model that generates attention masks corresponding to lattice symmetry priors using a CNN. Our results on synthetic tasks show that our model can generalize better than the same attention modules without masking and Transformers. Moreover, the performance of our method on a subset of ARC provides the first evidence that deep learning can be used on this dataset, which is widely considered as an important open challenge for research on artificial intelligence.
