This story draft by @escholar has not been reviewed by an editor, YET.

Analyzing constrained LLM through PDFA-learning: Conclusions. Acknowledgements, and References

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) M. Carrasco, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(2) F. Mayr, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(3) S. Yovine, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(4) J. Kidd, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(5) M. Iturbide, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(6) J. da Silva, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(7) A. Garat, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay.

Table of Links

Abstract and 1 Introduction

2 Language models

3 Learning algorithm

4 Analyzing large language models

5 Conclusions. Acknowledgements, and References

A. Proof of Proposition 2.1

B. Proof of Proposition 2.2

C. Proof of Proposition 2.3

D. Proof of Proposition 2.4

5 Conclusions

This work was motivated by the need of understanding LLM when their operation is controlled by external artifacts, such as grammars, to generate text following a specific format. An important question that arise in this context is how to deal with 0-probabilities that appear when restricting their output. To start up with, we revised the congruence (1) in order to make constructing the quotient less dependent on P by expressing it in terms of the output of the language model. The first consequence of this operational view is to allow a generalization of the congruence capable of dealing with equivalences on distributions. Besides, it led to developing a variant of the QNT active-learning algorithm to efficiently learn PDFA by avoiding to check for 0-probability transitions as much as possible. This is


Figure 6: Distributions of floats and the lengths of their representing strings (token sampling).


essential to make it computationally feasible by reducing the number of queries to the LLM. The experimental results[3] support the viability of our approach for analyzing and validating statistical properties of LLM, such as bias in text generation. Besides, they provided evidence that distributions resulting from generation of a guided LLM could be well approximated by a learnt PDFA. This opens the door to make these analyses less dependent on sampling by studying properties of the PDFA.


Acknowledgements Research reported in this article has been partially funded by ANII-Agencia Nacional de Investigacion e Innovaci ´ on under grants IA ´ 1 2022 1 173516, FMV 1 2023 1 175864, POS NAC 2023 1 178663, and POS FMV 2023 1 1012218.

References

[1] R. C. Carrasco and J. Oncina. Learning deterministic regular grammars from stochastic samples in polynomial time. RAIRO - Theoretical Informatics and Applications, 33(1):1–19, 1999.


[2] L. Du, L. Torroba Hennigen, T. Pimentel, C. Meister, J. Eisner, and R. Cotterell. A measure-theoretic characterization of tight language models. In 61st ACL, pages 9744–9770, July 2023.


[3] J. E. Hopcroft and R. M. Karp. A linear algorithm for testing equivalence of finite automata. 1971.


[4] I. Khmelnitsky, D. Neider, R. Roy, X. Xie, B. Barbot, B. Bollig, A. Finkel, S. Haddad, M. Leucker, and L. Ye. Property-directed verification and robustness certification of recurrent neural networks. In ATVA, pages 364–380, Cham, 2021. Springer International Publishing.


[5] M. Kuchnik, V. Smith, and G. Amvrosiadis. Validating large language models with ReLM. In MLSys, 2023.


[6] F. Mayr, S. Yovine, M. Carrasco, F. Pan, and F. Vilensky. A congruence-based approach to active automata learning from neural language models. In PMLR, volume 217, pages 250–264, 2023.


[7] F. Mayr, S. Yovine, and R. Visca. Property checking with interpretable error characterization for recurrent neural networks. MAKE, 3(1):205–227, 2021.


[8] E. Muskardin, M. Tappler, and B. K. Aichernig. Testing-based black-box extraction of simple models from rnns ˇ and transformers. In PMLR, volume 217, pages 291–294, 10–13 Jul 2023.


[9] C. Nicaud. Random deterministic automata. In MFCS’14, pages 5–23. LNCS 8634, 2014.


[10] P. C. Shields. The ergodic theory of discrete sample paths. AMS, 1996.


[11] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite-state machines - part i. IEEE PAMI, 27(7):1013–1025, 2005.


[12] Q. Wang, K. Zhang, A. G. Ororbia, II, X. Xing, X. Liu, and C. L. Giles. An empirical evaluation of rule extraction from recurrent neural networks. Neural Comput., 30(9):2568–2591, 2018.


[13] G. Weiss, Y. Goldberg, and E. Yahav. Extracting automata from recurrent neural networks using queries and counterexamples. In PMLR, volume 80, 10–15 Jul 2018.


[14] B. T. Willard and R. Louf. Efficient guided generation for LLMs. Preprint arXiv:2307.09702, 2023.


This paper is available on arxiv under CC BY-SA 4.0 by Deed (Attribution-Sharealike 4.0 International) license.


[3] https://github.com/neuralchecker/analyzing_constrained_LLM_through_PDFA_learning

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks