This story draft by @escholar has not been reviewed by an editor, YET.

Analyzing constrained LLM through PDFA-learning: Analyzing large language models

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) M. Carrasco, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(2) F. Mayr, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(3) S. Yovine, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(4) J. Kidd, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(5) M. Iturbide, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(6) J. da Silva, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(7) A. Garat, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay.

Table of Links

Abstract and 1 Introduction

2 Language models

3 Learning algorithm

4 Analyzing large language models

5 Conclusions. Acknowledgements, and References

A. Proof of Proposition 2.1

B. Proof of Proposition 2.2

C. Proof of Proposition 2.3

D. Proof of Proposition 2.4

4 Analyzing large language models

Guiding generation Guiding an LLM to generate strings of interest consists in synchronizing it with a automaton that defines the set of symbols that can be drawn at each step of the generation process, which could be constrained further by a sampling strategy. To illustrate how the synchronization works, consider the language model given by the PDFA L in Fig. 4 (0-probabilities are omitted). The guide G is a weighted automaton that defines a mask at each state: a weight of 1 for a symbol means it is allowed, otherwise it is not. L × G is a weighted automaton whose underlying structure is the product automaton, and weights are obtained by taking the product of the distribution of the state of L with the weights of the state of G. To obtain PDFA B, we apply the sampling strategy samptop2.



Table 1: Results obtained with two tokenizer instances for GPT2



This paper is available on arxiv under CC BY-SA 4.0 by Deed (Attribution-Sharealike 4.0 International) license.


[2] https://huggingface.co/docs/transformers/main_classes/tokenizer

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks