paint-brush
The Abstraction and Reasoning Corpus: A. Additional Details on the Modeby@escholar

The Abstraction and Reasoning Corpus: A. Additional Details on the Mode

by EScholar: Electronic Academic Papers for Scholars
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

EScholar: Electronic Academic Papers for Scholars

@escholar

We publish the best academic work (that's too often lost...

March 11th, 2024
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

State-of-the-art machine learning models struggle with generalization which can only be achieved by proper accounting for core knowledge priors.
featured image - The Abstraction and Reasoning Corpus: A. Additional Details on the Mode
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars

EScholar: Electronic Academic Papers for Scholars

@escholar

We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

About @escholar
LEARN MORE ABOUT @ESCHOLAR'S
EXPERTISE AND PLACE ON THE INTERNET.

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mattia Atzeni, EPFL, Switzerland and mattia.atzeni@outlook.it;

(2) Mrinmaya Sachan, ETH Zurich, Switzerland;

(3) Andreas Loukas, Prescient Design, Switzerland.

A. Additional Details on the Model

This section describes the LATFORMER architecture providing additional details that were not covered in Section 4.1. As mentioned in Section 4.1, it is possible to design convolutional neural networks that perform all considered transformations of the lattice. Figure 6 shows the architecture of the four expert models that generate translation, rotation, reflection and scaling masks.


Figure 6: Model architecture of all the mask experts that we considered.

Figure 6: Model architecture of all the mask experts that we considered.


All models are CNNs applied to the identity matrix. In the figure, we use the following notation:


• M(δ) T denotes an attention mask implementing a translation by δ along one dimension;


• M(90) R denotes an attention mask implementing a translation by 90◦ ;


• MF denotes an attention mask implementing a reflection along one dimension;


• M(h) S denotes an attention mask implementing an upscaling by h along one dimension.


Using Corollary 3.3, we can derive the kernels of the convolutional layers shown in Figure 6. These kernels are frozen at training time, the model only learns the gating function, denoted as σ in the figure. Notice that all the models follow the same overall structure. However, for scaling, we also learn an additional gate, denoted as σ(MS,M⊤ S ) in the Figure 6. This gate allows the model to transpose the mask and serves the purpose of implementing down-scaling operations (down-scaling is the transpose of up-scaling).


The composition of more actions can be obtained by combining different experts. This can be done either by chaining the experts or by matrix multiplication of the masks. In preliminary experiments, we did not notice any significant difference in performance between the two options and we rely on the latter in our implementation.

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
X REMOVE AD