This story draft by @escholar has not been reviewed by an editor, YET.

Analyzing constrained LLM through PDFA-learning: Language models

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) M. Carrasco, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(2) F. Mayr, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(3) S. Yovine, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);

(4) J. Kidd, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(5) M. Iturbide, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(6) J. da Silva, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;

(7) A. Garat, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay.

Table of Links

Abstract and 1 Introduction

2 Language models

3 Learning algorithm

4 Analyzing large language models

5 Conclusions. Acknowledgements, and References

A. Proof of Proposition 2.1

B. Proof of Proposition 2.2

C. Proof of Proposition 2.3

D. Proof of Proposition 2.4

2 Language models

Let Σ be a finite set of symbols, Σ ∗ the set of finite strings, λ ∈ Σ ∗ the empty string, and Σ$ ≜ Σ ∪ {$}, where $ P ̸∈ Σ is a special symbol used to denote termination. The probability simplex over Σ$ is ∆(Σ$) ≜ {ρ : Σ → R+ | σ∈Σ ρ(σ) = 1}. The support of ρ ∈ ∆(Σ$) is supp(ρ) ≜ {σ ∈ Σ | ρ(σ) > 0}. A language model is a total function L : Σ∗ → ∆(Σ$).


Language models can be expressed in different ways, e.g., RNN, Transformers, and PDFA. Following [6], we define a PDFA A over Σ as a tuple (Q, qin, π, τ ), where Q is a finite set of states, qin ∈ Q is the initial state, π : Q → ∆(Σ$), and τ : Q × Σ → Q. Both π and τ are total functions. We define τ ∗ and π ∗ as follows: τ ∗ (q, λ) ≜ q and τ ∗ (q, σu) ≜ τ ∗ (τ (q, σ), u), and π ∗ (q, u) ≜ π(τ ∗ (q, u)). We omit the state if it is qin and write τ ∗ (u) and π ∗ (u).


A defines the language model such that A(u) ≜ π ∗ (u). Fig. 1 gives examples of PDFA. The number below q is the probability of termination π(q)($), and the one associated with an outgoing transition labeled σ corresponds to π(q)(σ).




Congruences P is used in [1, 11] to define the equivalence relation ≡ in Σ ∗ which is a congruence with respect concatenating a symbol:






This paper is available on arxiv under CC BY-SA 4.0 by Deed (Attribution-Sharealike 4.0 International) license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks