Author:
(1) Brandon T. Willard, Normal Computing;
(2) R´emi Louf, Normal Computing. Table of Links Abstract and Intro
LLM Sampling and Guided Generation
Iterative FSM Processing and Indexing
Extensions to Iterative Parsing
Discussion, References and Acknowledgments Abstract In this article we show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars by allowing the construction of an index over a language model’s vocabulary. The approach is model agnostic, allows one to enforce domain-specific knowledge and constraints, and enables the construction of reliable interfaces by guaranteeing the structure of the generated text. It adds little overhead to the token sequence generation process and significantly outperforms existing solutions. An implementation is provided in the open source Python library Outlines [Louf and Willard]. 1. Introduction We are concerned with the problem of generating sequences of tokens from a large language model (LLM) [Vaswani et al., 2017, Radford et al., 2019] that conform to regular expressions or context-free grammars (CFGs). This kind of guided LLM generation is used to make LLM model output usable under rigid formatting requirements that are either hard or costly to capture through fine-tuning alone [Beurer-Kellner et al., 2023, Scholak et al., 2021, Poesia et al., 2022a, Rabinovich et al., 2017, Weng, 2021, Dong et al., 2023, Poesia et al., 2022b, Geng et al., 2023, Wang et al., 2023]. Such features have recently been generalized in prompting libraries and interfaces [Microsoft, 2023, Beurer-Kellner et al., 2023, Rickard, 2023a,b], but their applicability can be limited by their scaling costs. Most implementations of guided generation bias the score values used to determine the probabilities of the tokens in an LLM’s vocabulary. A common and sufficient approach involves repeated evaluations over the entire vocabulary in order to determine which tokens are valid–according to the constraints and previously sampled tokens–and setting the probabilities of invalid tokens to zero. This approach entails a fixed O(N) cost for each token generated, where N is the size of the LLM’s vocabulary. We propose an approach that uses the finite state machine (FSM) formulation of regular expressions to both arbitrarily start and stop guided generation and allow the construction of an index with which the set of nonzero-probability tokens can be obtained efficiently at each step. The result is an algorithm that costs O(1) on average. For the regular expression case, our approach shares the most similarity with Kuchnik et al. [2023], which uses a transducer formulation to obtain FSMs defined over a language model’s vocabulary, and these FSMs contain much of the same information and scaling benefits as the indices described here. Our approach does not require the complete transducer abstraction and can be used to more easily extend existing, efficient regular expression libraries without modifying the underlying automatons and their implementations. More importantly, our indexing approach can also be extended to CFGs and LALR(1) parsers to allow for efficient guided generation according to popular data formats and programming languages (e.g. JSON, Python, SQL, etc.). The transition to parsing is made by way of augmentations to traditional LALR(1) parser components and operations, making it–again–an approach that can be used to extend existing parser implementations. This paper is available on arxiv under CC 4.0 license. Author: (1) Brandon T. Willard, Normal Computing; (2) R´emi Louf, Normal Computing. Author: Author: (1) Brandon T. Willard, Normal Computing; (2) R´emi Louf, Normal Computing. Table of Links Abstract and Intro LLM Sampling and Guided Generation Iterative FSM Processing and Indexing Extensions to Iterative Parsing Discussion, References and Acknowledgments Abstract and Intro Abstract and Intro LLM Sampling and Guided Generation LLM Sampling and Guided Generation Iterative FSM Processing and Indexing Iterative FSM Processing and Indexing Extensions to Iterative Parsing Extensions to Iterative Parsing Discussion, References and Acknowledgments Discussion, References and Acknowledgments Abstract In this article we show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars by allowing the construction of an index over a language model’s vocabulary. The approach is model agnostic, allows one to enforce domain-specific knowledge and constraints, and enables the construction of reliable interfaces by guaranteeing the structure of the generated text. It adds little overhead to the token sequence generation process and significantly outperforms existing solutions. An implementation is provided in the open source Python library Outlines [Louf and Willard]. 1. Introduction We are concerned with the problem of generating sequences of tokens from a large language model (LLM) [Vaswani et al., 2017, Radford et al., 2019] that conform to regular expressions or context-free grammars (CFGs). This kind of guided LLM generation is used to make LLM model output usable under rigid formatting requirements that are either hard or costly to capture through fine-tuning alone [Beurer-Kellner et al., 2023, Scholak et al., 2021, Poesia et al., 2022a, Rabinovich et al., 2017, Weng, 2021, Dong et al., 2023, Poesia et al., 2022b, Geng et al., 2023, Wang et al., 2023]. Such features have recently been generalized in prompting libraries and interfaces [Microsoft, 2023, Beurer-Kellner et al., 2023, Rickard, 2023a,b], but their applicability can be limited by their scaling costs. Most implementations of guided generation bias the score values used to determine the probabilities of the tokens in an LLM’s vocabulary. A common and sufficient approach involves repeated evaluations over the entire vocabulary in order to determine which tokens are valid–according to the constraints and previously sampled tokens–and setting the probabilities of invalid tokens to zero. This approach entails a fixed O(N) cost for each token generated, where N is the size of the LLM’s vocabulary. We propose an approach that uses the finite state machine (FSM) formulation of regular expressions to both arbitrarily start and stop guided generation and allow the construction of an index with which the set of nonzero-probability tokens can be obtained efficiently at each step. The result is an algorithm that costs O(1) on average. For the regular expression case, our approach shares the most similarity with Kuchnik et al. [2023], which uses a transducer formulation to obtain FSMs defined over a language model’s vocabulary, and these FSMs contain much of the same information and scaling benefits as the indices described here. Our approach does not require the complete transducer abstraction and can be used to more easily extend existing, efficient regular expression libraries without modifying the underlying automatons and their implementations. More importantly, our indexing approach can also be extended to CFGs and LALR(1) parsers to allow for efficient guided generation according to popular data formats and programming languages (e.g. JSON, Python, SQL, etc.). The transition to parsing is made by way of augmentations to traditional LALR(1) parser components and operations, making it–again–an approach that can be used to extend existing parser implementations. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Efficient Guided Generation for Large Language Models: Abstract and Intro

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Efficient Guided Generation for Large Language Models: Iterative FSM Processing and Indexing

Efficient Guided Generation for Large Language Models: LLM Sampling and Guided Generation

Efficient Guided Generation for Large Language Models: Extensions to Iterative Parsing

Efficient Guided Generation for Large Language Models: Discussion, References and Acknowledgments

Syntax Error-Free and Generalizable Tool Use for LLMs: Conclusion and References

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Efficient Guided Generation for Large Language Models: Iterative FSM Processing and Indexing

Efficient Guided Generation for Large Language Models: LLM Sampling and Guided Generation

Efficient Guided Generation for Large Language Models: Extensions to Iterative Parsing

Efficient Guided Generation for Large Language Models: Discussion, References and Acknowledgments

Syntax Error-Free and Generalizable Tool Use for LLMs: Conclusion and References

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps