The Abstraction and Reasoning Corpus: A. Additional Details on the Mode

by EScholar: Electronic Academic Papers for ScholarsMarch 11th, 2024

Too Long; Didn't Read

State-of-the-art machine learning models struggle with generalization which can only be achieved by proper accounting for core knowledge priors.

featured image - The Abstraction and Reasoning Corpus: A. Additional Details on the Mode

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mattia Atzeni, EPFL, Switzerland and mattia.atzeni@outlook.it;

(2) Mrinmaya Sachan, ETH Zurich, Switzerland;

(3) Andreas Loukas, Prescient Design, Switzerland.

Table of Links

A. Additional Details on the Model

This section describes the LATFORMER architecture providing additional details that were not covered in Section 4.1. As mentioned in Section 4.1, it is possible to design convolutional neural networks that perform all considered transformations of the lattice. Figure 6 shows the architecture of the four expert models that generate translation, rotation, reflection and scaling masks.

Figure 6: Model architecture of all the mask experts that we considered.

All models are CNNs applied to the identity matrix. In the figure, we use the following notation:

• M(δ) T denotes an attention mask implementing a translation by δ along one dimension;

• M(90) R denotes an attention mask implementing a translation by 90◦ ;

• MF denotes an attention mask implementing a reflection along one dimension;

• M(h) S denotes an attention mask implementing an upscaling by h along one dimension.

Using Corollary 3.3, we can derive the kernels of the convolutional layers shown in Figure 6. These kernels are frozen at training time, the model only learns the gating function, denoted as σ in the figure. Notice that all the models follow the same overall structure. However, for scaling, we also learn an additional gate, denoted as σ(MS,M⊤ S ) in the Figure 6. This gate allows the model to transpose the mask and serves the purpose of implementing down-scaling operations (down-scaling is the transpose of up-scaling).

The composition of more actions can be obtained by combining different experts. This can be done either by chaining the experts or by matrix multiplication of the masks. In preliminary experiments, we did not notice any significant difference in performance between the two options and we rely on the latter in our implementation.

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars@escholar

We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Read my stories About @escholar

TOPICS

machine-learning #abstraction-corpus #reasoning-corpus #arc #languagecomplete-instantiation #larc #non-learning-ai #latformer #neural-network-learning

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Terminal

Lite

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas