paint-brush
The Abstraction and Reasoning Corpus: Abstract & Introductionby@escholar
148 reads

The Abstraction and Reasoning Corpus: Abstract & Introduction

tldt arrow

Too Long; Didn't Read

State-of-the-art machine learning models struggle with generalization which can only be achieved by proper accounting for core knowledge priors.
featured image - The Abstraction and Reasoning Corpus: Abstract & Introduction
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mattia Atzeni, EPFL, Switzerland and [email protected];

(2) Mrinmaya Sachan, ETH Zurich, Switzerland;

(3) Andreas Loukas, Prescient Design, Switzerland.

Abstract

The Abstraction and Reasoning Corpus (ARC) (Chollet, 2019) and its most recent languagecomplete instantiation (LARC) has been postulated as an important step towards general AI. Yet, even state-of-the-art machine learning models struggle to achieve meaningful performance on these problems, falling behind non-learning based approaches. We argue that solving these tasks requires extreme generalization that can only be achieved by proper accounting for core knowledge priors. As a step towards this goal, we focus on geometry priors and introduce LATFORMER, a model that incorporates lattice symmetry priors in attention masks. We show that, for any transformation of the hypercubic lattice, there exists a binary attention mask that implements that group action. Hence, our study motivates a modification to the standard attention mechanism, where attention weights are scaled using soft masks generated by a convolutional network. Experiments on synthetic geometric reasoning show that LATFORMER requires 2 orders of magnitude fewer data than standard attention and transformers. Moreover, our results on ARC and LARC tasks that incorporate geometric priors provide preliminary evidence that these complex datasets do not lie out of the reach of deep learning models.


1. Introduction

Infusing inductive biases and knowledge priors in neural networks is regarded as a critical step to improve their sample efficiency (Battaglia et al., 2018; Bengio, 2017; Lake et al., 2017; Lake & Baroni, 2018; Bahdanau et al., 2019). The Core Knowledge priors for human intelligence have been studied extensively in developmental science (Spelke & Kinzler, 2007), following the theory that humans are endowed with a small number of separable systems of core knowledge, so that new flexible skills and belief systems can build on these core foundations. Recent research in artificial intelligence (AI) has postulated the idea that the same priors should be incorporated in AI systems (Chollet, 2019), but it is an open question how to incorporate these priors in neural networks.


Following this chain of thought, the Abstraction and Reasoning Corpus (ARC) (Chollet, 2019) was proposed as an AI benchmark built on top of the Core Knowledge priors from developmental science. Chollet (2019) posits that developing a domain-specific approach based on the Core Knowledge priors is a challenging first step and that “solving this specific subproblem is critical to general AI progress”. Further, he argues that ARC “cannot be meaningfully approached by current machine learning techniques, including Deep Learning”.


An important category of Core Knowledge priors includes geometry and topology priors. Indeed, significant attention has been devoted to incorporating such priors in deep learning architectures by rendering neural networks invariant (or equivariant) to transformations represented through group actions (Bronstein et al., 2021). However, group-invariant learning helps to build models that systematically ignore specific transformations applied to the input (such as translations or rotations).


We take a complementary perspective and aim to help neural networks to learn functions that incorporate geometric transformations of their input (rather than to be invariant to such transformations). In particular, we focus on group actions that belong to the symmetry group of a lattice. These transformations are pervasive in machine learning applications, as basic transformations of sequences, images, and other higher-dimensional regular grids fall in this category. While attention and transformers can in principle learn these kind of group actions, we show that they require a significant amount of training data to do so.


To address this sample complexity issue, we introduce LATFORMER, a model that relies on attention masks in order to learn actions belonging to the symmetry group of a lattice, such as translation, rotation, reflection, and scaling, in a differentiable manner. We show that, for any such action, there exists an attention mask such that an untrained self-attention mechanism initialized to the identity function performs that action. We further prove that these attention masks can be expressed as convolutions of the identity, which motivates a modification to the standard attention module where the attention weights are modulated by a mask generated by a convolutional neural network (CNN).


Figure 1: We consider problems that involve learning a geometric transformation on the input data as a sub-problem. Thedisplayed task (taken from ARC) entails learning to map, for each pair, the left to the right image. We investigate how to


Our experiments focus on abstract geometric reasoning and, more specifically, on ARC and its variants, as they are widely regarded as challenging benchmarks for machine learning models (Acquaviva et al., 2021; Chollet, 2019). On these datasets, we aim to reduce the gap between neural networks and hand-engineered search algorithms. To probe the sample efficiency of our method, we compared its ability to learn synthetic geometric transformations against Transformers and attention modules. Then, we annotated ARC tasks based on the knowledge priors they require, and we evaluated LATFORMER on the ARC tasks requiring geometric knowledge priors. Finally, we performed experiments on the more recent Language-complete ARC (LARC) (Acquaviva et al., 2021), which enriches ARC tasks with natural-language descriptions, and we compared our model against strong baselines based on neural program synthesis. Our results provide evidence that LATFORMER can learn geometric transformations with 2 orders of magnitude fewer training data than transformers and attention. We also significantly reduce the gap between neural and classical approaches on ARC, providing the first neural network that reaches good performance on ARC tasks requiring geometric knowledge priors.