Authors:
(1) Albert Gu, Machine Learning Department, Carnegie Mellon University with Equal contribution (agu@cs.cmu.edu);
(2) Tri Dao, Department of Computer Science, Princeton University with Equal contribution (tri@tridao.me). Authors: Authors: (1) Albert Gu, Machine Learning Department, Carnegie Mellon University with Equal contribution (agu@cs.cmu.edu); (2) Tri Dao, Department of Computer Science, Princeton University with Equal contribution (tri@tridao.me). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2 State Space Models 2 State Space Models 3 Selective State Space Models and 3.1 Motivation: Selection as a Means of Compression 3 Selective State Space Models and 3.1 Motivation: Selection as a Means of Compression 3.2 Improving SSMs with Selection 3.2 Improving SSMs with Selection 3.3 Efficient Implementation of Selective SSMs 3.3 Efficient Implementation of Selective SSMs 3.4 A Simplifed SSM Architecture 3.4 A Simplifed SSM Architecture 3.5 Properties of Selection Mechanisms 3.5 Properties of Selection Mechanisms 3.6 Additional Model Details 3.6 Additional Model Details 4 Empirical Evaluation and 4.1 Synthetic Tasks 4 Empirical Evaluation and 4.1 Synthetic Tasks 4.2 Language Modeling 4.2 Language Modeling 4.3 DNA Modeling 4.3 DNA Modeling 4.4 Audio Modeling and Generation 4.4 Audio Modeling and Generation 4.5 Speed and Memory Benchmarks 4.5 Speed and Memory Benchmarks 4.6 Model Ablations 4.6 Model Ablations 5 Discussion 5 Discussion 6 Conclusion, Acknowledgments and References 6 Conclusion, Acknowledgments and References A Discussion: Selection Mechanism A Discussion: Selection Mechanism B Related Work and B.1 S4 Variants and Derivatives B Related Work and B.1 S4 Variants and Derivatives B.2 SSM Architectures B.2 SSM Architectures B.3 Relationship to RNNs B.3 Relationship to RNNs B.4 Linear Attention and B.5 Long Context Models B.4 Linear Attention and B.5 Long Context Models C Mechanics of Selective SSMs C Mechanics of Selective SSMs D Hardware-aware Algorithm For Selective SSMs D Hardware-aware Algorithm For Selective SSMs E Experimental Details and Additional Results and E.1 Synthetic Tasks E Experimental Details and Additional Results and E.1 Synthetic Tasks E.2 Language Modeling E.2 Language Modeling E.3 DNA Modeling E.3 DNA Modeling E.4 Audio Details E.4 Audio Details E.5 Efficiency Benchmark E.5 Efficiency Benchmark 5 Discussion We discuss related work, limitations, and some future directions. Related Work. Appendix A discusses how the selection mechanism relates to similar concepts. Appendix B has an extended related work of SSMs and other related models. Related Work. No Free Lunch: Continuous-Discrete Spectrum. Structured SSMs were originally defined as discretizations of continuous systems (1), and have had a strong inductive bias toward continuous-time data modalities such as perceptual signals (e.g. audio, video). As discussed in Sections 3.1 and 3.5, the selection mechanism overcomes their weaknesses on discrete modalities such as text and DNA; but this conversely can impede their performance on data that LTI SSMs excel on. Our ablations on audio waveforms examine this tradeoff in more detail. No Free Lunch: Downstream Affordances. Transformer-based foundation models (particularly LLMs) have a rich ecosystem of properties and modes of interaction with pretrained models, such as fine-tuning, adaptation, prompting, in-context learning, instruction tuning, RLHF, quantization, and so on. We are particularly interested in whether Transformer alternatives such as SSMs have similar properties and affordances. Downstream Affordances. Scaling. Our empirical evaluation is limited to small model sizes, below the threshold of most strong open source LLMs (e.g. Llama (Touvron et al. 2023)) as well as other recurrent models such as RWKV (B. Peng et al. 2023) and RetNet (Y. Sun et al. 2023), which have been evaluated at the 7B parameter scale and beyond. It remains to assess whether Mamba still compares favorably at these larger sizes. We also note that scaling SSMs may involve further engineering challenges and adjustments to the model that are not discussed in this paper. Scaling. 6 Conclusion We introduce a selection mechanism to structured state space models, allowing them to perform context-dependent reasoning while scaling linearly in sequence length. When incorporated into a simple attention-free architecture, Mamba achieves state-of-the-art results on a diverse set of domains, where it matches or exceeds the performance of strong Transformer models. We are excited about the broad applications of selective state space models to build foundation models for different domains, especially in emerging modalities requiring long context such as genomics, audio, and video. Our results suggest that Mamba is a strong candidate to be a general sequence model backbone. Acknowledgments We thank Karan Goel, Arjun Desai, and Kush Bhatia for helpful feedback on the draft. References This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Abstract

Empirical

Llama

scale

Mamba: A Generalized Sequence Model Backbone for AI

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Coin3D Achieves Superior Control and Efficiency in 3D Generation

It Is Okay If You Don't Know What You Like. We Do (feat. Deep Recommendation Algorithms)

10 Machine Learning, Data Science, and Deep Learning Courses for Programmers in 2020

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Best Entry Level Machine Learning Tutorials

10 Best + Free Machine Learning Courses Collection

Coin3D Achieves Superior Control and Efficiency in 3D Generation

It Is Okay If You Don't Know What You Like. We Do (feat. Deep Recommendation Algorithms)

10 Machine Learning, Data Science, and Deep Learning Courses for Programmers in 2020

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Best Entry Level Machine Learning Tutorials

10 Best + Free Machine Learning Courses Collection

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps