A Simplified State Space Model Architecture

Written by serialization | Published 2024/12/16
Tech Story Tags: deep-learning | transformer-architecture | mamba-model | ai-sequence-modeling | genomics-ai-solutions | latent-state-ai-models | hyena-architecture | state-space-models

TLDRA simplified state space model (SSM) architecture merges linear attention and MLP components into a single, unified block, inspired by the gated attention unit (GAU). This design reduces complexity while improving performance in sequence transformation tasks.via the TL;DR App

Authors:

(1) Albert Gu, Machine Learning Department, Carnegie Mellon University and with equal contribution;

(2) Tri Dao, Department of Computer Science, Princeton University and with equal contribution.

Table of Links

Abstract and 1 Introduction

2 State Space Models

3 Selective State Space Models and 3.1 Motivation: Selection as a Means of Compression

3.2 Improving SSMs with Selection

3.3 Efficient Implementation of Selective SSMs

3.4 A Simplified SSM Architecture

3.5 Properties of Selection Mechanisms

3.6 Additional Model Details

4 Empirical Evaluation and 4.1 Synthetic Tasks

4.2 Language Modeling

4.3 DNA Modeling

4.4 Audio Modeling and Generation

4.5 Speed and Memory Benchmarks

4.6 Model Ablations

5 Discussion

6 Conclusion and References

A Discussion: Selection Mechanism

B Related Work

C Mechanics of Selective SSMs

D Hardware-aware Algorithm For Selective SSMs

E Experimental Details and Additional Results

3.4 A Simplified SSM Architecture

As with structured SSMs, selective SSMs are standalone sequence transformations that can be flexibly incorporated into neural networks. The H3 architecture is the basis for the most well-known SSM architectures (Section 2), which are generally comprised of a block inspired by linear attention interleaved with an MLP (multi-layer perceptron) block. We simplify this architecture by combining these two components into one, which is stacked homogenously (Figure 3). This is inspired by the gated attention unit (GAU) (Hua et al. 2022), which did something similar for attention.

This paper is available on arxiv under CC BY 4.0 DEED license.


Written by serialization | We cover the most cutting edge academic research and expert blog posts on serialization. Also big fans of the Serial pod
Published by HackerNoon on 2024/12/16