Authors:
(1) M. Carrasco, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);
(2) F. Mayr, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);
(3) S. Yovine, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay ([email protected]);
(4) J. Kidd, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;
(5) M. Iturbide, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;
(6) J. da Silva, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay;
(7) A. Garat, Facultad de Ingenierıa, Universidad ORT Uruguay, Montevideo, Uruguay.
Table of Links
4 Analyzing large language models
5 Conclusions. Acknowledgements, and References
2 Language models
Let Σ be a finite set of symbols, Σ ∗ the set of finite strings, λ ∈ Σ ∗ the empty string, and Σ$ ≜ Σ ∪ {$}, where $ P ̸∈ Σ is a special symbol used to denote termination. The probability simplex over Σ$ is ∆(Σ$) ≜ {ρ : Σ → R+ | σ∈Σ ρ(σ) = 1}. The support of ρ ∈ ∆(Σ$) is supp(ρ) ≜ {σ ∈ Σ | ρ(σ) > 0}. A language model is a total function L : Σ∗ → ∆(Σ$).
Language models can be expressed in different ways, e.g., RNN, Transformers, and PDFA. Following [6], we define a PDFA A over Σ as a tuple (Q, qin, π, τ ), where Q is a finite set of states, qin ∈ Q is the initial state, π : Q → ∆(Σ$), and τ : Q × Σ → Q. Both π and τ are total functions. We define τ ∗ and π ∗ as follows: τ ∗ (q, λ) ≜ q and τ ∗ (q, σu) ≜ τ ∗ (τ (q, σ), u), and π ∗ (q, u) ≜ π(τ ∗ (q, u)). We omit the state if it is qin and write τ ∗ (u) and π ∗ (u).
A defines the language model such that A(u) ≜ π ∗ (u). Fig. 1 gives examples of PDFA. The number below q is the probability of termination π(q)($), and the one associated with an outgoing transition labeled σ corresponds to π(q)(σ).
Congruences P is used in [1, 11] to define the equivalence relation ≡ in Σ ∗ which is a congruence with respect concatenating a symbol:
This paper is