Authors:
(1) Albert Gu, Machine Learning Department, Carnegie Mellon University and with equal contribution;
(2) Tri Dao, Department of Computer Science, Princeton University and with equal contribution. Table of Links Abstract and 1 Introduction 2 State Space Models 3 Selective State Space Models and 3.1 Motivation: Selection as a Means of Compression 3.2 Improving SSMs with Selection 3.3 Efficient Implementation of Selective SSMs 3.4 A Simplified SSM Architecture 3.5 Properties of Selection Mechanisms 3.6 Additional Model Details 4 Empirical Evaluation and 4.1 Synthetic Tasks 4.2 Language Modeling 4.3 DNA Modeling 4.4 Audio Modeling and Generation 4.5 Speed and Memory Benchmarks 4.6 Model Ablations 5 Discussion 6 Conclusion and References A Discussion: Selection Mechanism B Related Work C Mechanics of Selective SSMs D Hardware-aware Algorithm For Selective SSMs E Experimental Details and Additional Results 3.6 Additional Model Details Real vs. Complex. Most prior SSMs use complex numbers in their state ℎ, which is necessary for strong performance on many tasks (Gu, Goel, and Ré 2022). However, it has been empirically observed that completely real-valued SSMs seem to work fine, and possibly even better, in some settings (Ma et al. 2023). We use real values as the default, which work well for all but one of our tasks; we hypothesize that the complex-real tradeoff is related to the continuous-discrete spectrum in data modalities, where complex numbers are helpful for continuous modalities (e.g. audio, video) but not discrete (e.g. text, DNA). Initialization. Most prior SSMs also suggest special initializations, particularly in the complex-valued case, which can help in several settings such as low-data regimes. Our default initialization for the complex case is S4D-Lin and for the real case is S4D-Real (Gu, Gupta, et al. 2022), which is based on the HIPPO theory (Gu, Dao, et al. 2020). These define the n-th element of A as −1∕2 + ni and −(n + 1) respectively. However, we expect many initializations to work fine, particularly in the large-data and real-valued SSM regimes; some ablations are considered in Section 4.6. Remark 3.1. For brevity in our experimental results, we sometimes abbreviate selective SSMs as S6 models, because they are S4 models with a selection mechanism and computed with a scan. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Albert Gu, Machine Learning Department, Carnegie Mellon University and with equal contribution; (2) Tri Dao, Department of Computer Science, Princeton University and with equal contribution. Authors: Authors: (1) Albert Gu, Machine Learning Department, Carnegie Mellon University and with equal contribution; (2) Tri Dao, Department of Computer Science, Princeton University and with equal contribution. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 State Space Models 2 State Space Models 3 Selective State Space Models and 3.1 Motivation: Selection as a Means of Compression 3 Selective State Space Models and 3.1 Motivation: Selection as a Means of Compression 3.2 Improving SSMs with Selection 3.2 Improving SSMs with Selection 3.3 Efficient Implementation of Selective SSMs 3.3 Efficient Implementation of Selective SSMs 3.4 A Simplified SSM Architecture 3.4 A Simplified SSM Architecture 3.5 Properties of Selection Mechanisms 3.5 Properties of Selection Mechanisms 3.6 Additional Model Details 3.6 Additional Model Details 4 Empirical Evaluation and 4.1 Synthetic Tasks 4 Empirical Evaluation and 4.1 Synthetic Tasks 4.2 Language Modeling 4.2 Language Modeling 4.3 DNA Modeling 4.3 DNA Modeling 4.4 Audio Modeling and Generation 4.4 Audio Modeling and Generation 4.5 Speed and Memory Benchmarks 4.5 Speed and Memory Benchmarks 4.6 Model Ablations 4.6 Model Ablations 5 Discussion 5 Discussion 6 Conclusion and References 6 Conclusion and References A Discussion: Selection Mechanism A Discussion: Selection Mechanism B Related Work B Related Work C Mechanics of Selective SSMs C Mechanics of Selective SSMs D Hardware-aware Algorithm For Selective SSMs D Hardware-aware Algorithm For Selective SSMs E Experimental Details and Additional Results E Experimental Details and Additional Results 3.6 Additional Model Details Real vs. Complex. Most prior SSMs use complex numbers in their state ℎ, which is necessary for strong performance on many tasks (Gu, Goel, and Ré 2022). However, it has been empirically observed that completely real-valued SSMs seem to work fine, and possibly even better, in some settings (Ma et al. 2023). We use real values as the default, which work well for all but one of our tasks; we hypothesize that the complex-real tradeoff is related to the continuous-discrete spectrum in data modalities, where complex numbers are helpful for continuous modalities (e.g. audio, video) but not discrete (e.g. text, DNA). Real vs. Complex. Initialization . Most prior SSMs also suggest special initializations, particularly in the complex-valued case, which can help in several settings such as low-data regimes. Our default initialization for the complex case is S4D-Lin and for the real case is S4D-Real (Gu, Gupta, et al. 2022), which is based on the HIPPO theory (Gu, Dao, et al. 2020). These define the n-th element of A as −1∕2 + ni and −(n + 1) respectively. However, we expect many initializations to work fine, particularly in the large-data and real-valued SSM regimes; some ablations are considered in Section 4.6. Initialization Remark 3.1. For brevity in our experimental results, we sometimes abbreviate selective SSMs as S6 models, because they are S4 models with a selection mechanism and computed with a scan. Remark 3.1. For brevity in our experimental results, we sometimes abbreviate selective SSMs as S6 models, because they are S4 models with a selection mechanism and computed with a scan. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

The Key Differences Between Real and Complex-Valued State Space Models

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Simplified State Space Model Architecture

Princeton and CMU Push AI Boundaries with the Mamba Sequence Model

How State Space Models Improve AI Sequence Modeling Efficiency

Why Compressing Information Helps AI Work Better

How Selection Mechanisms Transform State Space Models

Cutting-Edge Techniques That Speed Up AI Without Extra Costs

A Simplified State Space Model Architecture

Princeton and CMU Push AI Boundaries with the Mamba Sequence Model

How State Space Models Improve AI Sequence Modeling Efficiency

Why Compressing Information Helps AI Work Better

How Selection Mechanisms Transform State Space Models

Cutting-Edge Techniques That Speed Up AI Without Extra Costs

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps