This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Mattia Atzeni, EPFL, Switzerland and [email protected];
(2) Mrinmaya Sachan, ETH Zurich, Switzerland;
(3) Andreas Loukas, Prescient Design, Switzerland.
Motivated by the long-term ambitious goal of infusing core knowledge priors in neural networks, this paper focused on how to help deep learning models to learn geometric transformations efficiently. Specifically, we proposed to incorporate lattice symmetry biases into attention mechanisms by modulating the attention weights using learned soft masks. We have shown that attention masks implementing the actions of the symmetry group of a hypercubic lattice exist, and we provided a way to represent these masks. This motivated us to introduce LATFORMER, a model that generates attention masks corresponding to lattice symmetry priors using a CNN. Our results on synthetic tasks show that our model can generalize better than the same attention modules without masking and Transformers. Moreover, the performance of our method on a subset of ARC provides the first evidence that deep learning can be used on this dataset, which is widely considered as an important open challenge for research on artificial intelligence.
Acquaviva, S., Pu, Y., Kryven, M., Wong, C., Ecanow, G. E., Nye, M. I., Sechopoulos, T., Tessler, M. H., and Tenenbaum, J. B. Communicating natural programs to humans and machines. CoRR, abs/2106.07824, 2021. URL https://arxiv.org/abs/2106.07824.
Andreoli, J.-M. Convolution, attention and structure embedding. arXiv preprint arXiv:1905.01289, 2019.
Atzeni, M. and Atzori, M. What is the cube root of 27? question answering over codeontology. In Vrandecic, D., Bontcheva, K., Suarez-Figueroa, M. C., Presutti, V., ´ Celino, I., Sabou, M., Kaffee, L., and Simperl, E. (eds.), The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, volume 11136 of Lecture Notes in Computer Science, pp. 285–300. Springer, 2018. doi: 10.1007/978-3-030-00671-6\ 17. URL https:// doi.org/10.1007/978-3-030-00671-6_17.
Atzeni, M., Bogojeska, J., and Loukas, A. SQALER: scaling question answering by decoupling multi-hop and logical reasoning. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 12587– 12599, 2021. URL https://proceedings. neurips.cc/paper/2021/hash/ 68bd22864919297c8c8a8c32378e89b4-Abstract. html.
Bahdanau, D., Murty, S., Noukhovitch, M., Nguyen, T. H., de Vries, H., and Courville, A. C. Systematic generalization: What is required and can it be learned? In ICLR 2019, 2019.
Battaglia, P. W., Hamrick, J. B., Bapst, V., SanchezGonzalez, A., Zambaldi, V. F., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gul¨ c¸ehre, C¸ ., Song, F., Ballard, A. J., Gilmer, J., Dahl, G. E., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y., and Pascanu, R. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018.
Bengio, Y. The consciousness prior. CoRR, abs/1709.08568, 2017.
Bronstein, M. M., Bruna, J., Cohen, T., and Velickovi ˇ c,´ P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
Chollet, F. On the measure of intelligence. CoRR, abs/1911.01547, 2019. URL http://arxiv.org/ abs/1911.01547.
Cordonnier, J.-B., Loukas, A., and Jaggi, M. On the relationship between self-attention and convolutional layers. In Eighth International Conference on Learning Representations-ICLR 2020, number CONF, 2020.
Cordonnier, J.-B., Loukas, A., and Jaggi, M. On the relationship between self-attention and convolutional layers. In Eighth International Conference on Learning Representations-ICLR 2020, number CONF, 2020.
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 6325–6334. IEEE Computer Society, 2017. doi: 10.1109/CVPR.2017.670. URL https://doi.org/10.1109/CVPR.2017.670.
Graves, A., Wayne, G., and Danihelka, I. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J. P., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., and Hassabis, D. Hybrid computing using a neural network with dynamic external memory. Nat., 538 (7626):471–476, 2016. doi: 10.1038/nature20101. URL https://doi.org/10.1038/nature20101.
Gul, M. S. K., Batz, M., and Keinert, J. Pixel-wise ¨ confidences for stereo disparities using recurrent neural networks. In 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019, pp. 23. BMVA Press, 2019. URL https://bmvc2019.org/wp-content/ uploads/papers/0274-paper.pdf.
Higgins, I., Sonnerat, N., Matthey, L., Pal, A., Burgess, C. P., Bosnjak, M., Shanahan, M., Botvinick, M. M., Hassabis, D., and Lerchner, A. SCAN: learning hierarchical compositional visual concepts. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https: //openreview.net/forum?id=rkN2Il-RZ.
Hudson, D. A. and Manning, C. D. Compositional attention networks for machine reasoning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum? id=S1Euwz-Rb.
Hudson, D. A. and Manning, C. D. Learning by abstraction: The neural state machine. In Wallach, H. M., Larochelle, H., Beygelzimer, A., d’Alche-Buc, F., Fox, E. B., and ´ Garnett, R. (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 5901–5914, 2019.
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., and Yi, K. M. COTR: correspondence transformer for matching across images. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 6187–6197. IEEE, 2021. doi: 10.1109/ICCV48922.2021.00615. URL https://doi.org/10.1109/ICCV48922. 2021.00615.
Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., and Girshick, R. B. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 1988–1997. IEEE Computer Society, 2017a. doi: 10.1109/CVPR.2017.215. URL https://doi.org/10.1109/CVPR.2017.215.
Johnson, J., Hariharan, B., van der Maaten, L., Hoffman, J., Fei-Fei, L., Zitnick, C. L., and Girshick, R. B. Inferring and executing programs for visual reasoning. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 3008–3017. IEEE Computer Society, 2017b. doi: 10.1109/ICCV.2017.325. URL https://doi.org/10.1109/ICCV.2017. 325.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Zˇ´ıdek, A., Potapenko, A., et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
Kolev, V., Georgiev, B., and Penkov, S. Neural abstract reasoner. CoRR, abs/2011.09860, 2020. URL https: //arxiv.org/abs/2011.09860.
Lake, B. M. and Baroni, M. Generalization without systematicity: On the compositional skills of sequence-tosequence recurrent networks. In ICML, 2018.
Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253, 2017. doi: 10.1017/S0140525X16001837.
Li, Y., Gimeno, F., Kohli, P., and Vinyals, O. Strong generalization and efficiency in neural programs. CoRR, abs/2007.03629, 2020. URL https://arxiv.org/ abs/2007.03629.
Lindblad, J. and Sladoje, N. Linear time distances between fuzzy sets with applications to pattern matching and classification. IEEE Trans. Image Process., 23(1):126–136, 2014. doi: 10.1109/TIP.2013.2286904. URL https: //doi.org/10.1109/TIP.2013.2286904.
Lowe, D. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 2, pp. 1150– 1157 vol.2, 1999. doi: 10.1109/ICCV.1999.790410.
Lu, J., Ofverstedt, J., Lindblad, J., and Sladoje, N. Is ¨ image-to-image translation the panacea for multimodal image registration? A comparative study. CoRR, abs/2103.16262, 2021. URL https://arxiv.org/ abs/2103.16262.
Mao, J., Gan, C., Kohli, P., Tenenbaum, J. B., and Wu, J. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview. net/forum?id=rJgMlhRctm.
Martinkus, K., Loukas, A., Perraudin, N., and Wattenhofer, R. Spectre: Spectral conditioning helps to overcome the expressivity limits of one-shot graph generators. In International Conference on Machine Learning, ICML, 2022.
Murugesan, K., Atzeni, M., Kapanipathi, P., Shukla, P., Kumaravel, S., Tesauro, G., Talamadupula, K., Sachan, M., and Campbell, M. Text-based RL agents with commonsense knowledge: New challenges, environments and baselines. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 9018–9027. AAAI Press, 2021a. URL https://ojs.aaai.org/index. php/AAAI/article/view/17090.
Murugesan, K., Atzeni, M., Kapanipathi, P., Talamadupula, K., Sachan, M., and Campbell, M. Efficient text-based reinforcement learning by jointly leveraging state and commonsense graph representations. In Zong, C., Xia, F., Li, W., and Navigli, R. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 2: Short Papers), Virtual Event, August 1-6, 2021, pp. 719–725. Association for Computational Linguistics, 2021b. doi: 10.18653/v1/2021. acl-short.91. URL https://doi.org/10.18653/ v1/2021.acl-short.91.
Pielawski, N., Wetzer, E., Ofverstedt, J., Lu, J., W ¨ ahlby, ¨ C., Lindblad, J., and Sladoje, N. Comir: Contrastive multimodal image representation for registration. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020. URL http://jmlr.org/papers/v21/20-074. html.
Reed, S. and De Freitas, N. Neural programmer-interpreters. arXiv preprint arXiv:1511.06279, 2015.
Sarlin, P., DeTone, D., Malisiewicz, T., and Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 4937–4946. Computer Vision Foundation / IEEE, 2020. doi: 10.1109/CVPR42600.2020.00499. URL https: //openaccess.thecvf.com/content_CVPR_ 2020/html/Sarlin_SuperGlue_Learning_ Feature_Matching_With_Graph_Neural_ Networks_CVPR_2020_paper.html.
Sartran, L., Barrett, S., Kuncoro, A., Stanojevic, M., Blunsom, P., and Dyer, C. Transformer grammars: Augmenting transformer language models with syntactic inductive biases at scale. CoRR, abs/2203.00633, 2022. doi: 10.48550/arXiv.2203.00633. URL https://doi. org/10.48550/arXiv.2203.00633.
Shaw, P., Uszkoreit, J., and Vaswani, A. Self-attention with relative position representations. In Walker, M. A., Ji, H., and Stent, A. (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), pp. 464–468. Association for Computational Linguistics, 2018. doi: 10.18653/v1/n18-2074. URL https://doi.org/ 10.18653/v1/n18-2074.
Spelke, E. S. and Kinzler, K. D. Core knowledge. Developmental Science, 10(1):89–96, 2007. doi: https://doi.org/10.1111/j.1467-7687.2007.00569.x. URL https://onlinelibrary.wiley.com/doi/ abs/10.1111/j.1467-7687.2007.00569.x.
Stark, H., Ganea, O.-E., Pattanaik, L., Barzilay, R., and ¨ Jaakkola, T. Equibind: Geometric deep learning for drug binding structure prediction, 2022.
Sverrisson, F., Feydy, J., Southern, J., Bronstein, M. M., and Correia, B. Physics-informed deep neural network for rigid-body protein docking. In ICLR2022 Machine Learning for Drug Discovery, 2022.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017.
Wind, J. S. DSL solution to the ARC challenge, 2020. URL https://github.com/ top-quarks/ARC-solution/blob/master/ ARC-solution_documentation.pdf.
Yan, Y., Swersky, K., Koutra, D., Ranganathan, P., and Hashemi, M. Neural execution engines: Learning to execute subroutines. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.