How AI Reasoning Mirrors Borges' Library of Babel

Note: the following is about vanilla transformers, but the conclusions and metaphors can be extended to reason fruitfully about newer LLMs (which will be covered in a later article).

Many people have the intuition that an LLM (Large Language Model, e.g. ChatGPT) doesn't really understand what it's saying. It doesn't really reason, it has no intent to convey anything, and it lacks an innate distinction between truth and falsehood.

The intuition is sound, but not so easy to substantiate. If you have a bit of technical savvy, you can point out that an LLM is just a fancy autocomplete, or a "token predictor". This consolidates the intuition a little - predicting tokens (i.e. word fragments, short words, symbols…) does seem like a far cry from deliberation - but it remains an intuition, which can be reasonably questioned: could you finish this sentence if you didn't build up an understanding of what the paragraph is getting at?

Now, those with more charity towards our talking machines will argue that LLMs are still poorly understood "black boxes", and accuse the skeptics of reductionism: isn't Nature full of "mundane" mechanisms that give rise to emergent phenomena, with qualities that differ from their basic causes? Maybe the complex mathematical machinery behind token prediction somehow produces genuine understanding?

Eh, probably not

The key thing to understand about the token predictor perspective is that it isn't actually reductionist - at least not in the way of calling the brain "just a clump of neurons".

It's a decent characterization of the high-level, functional definition of the model. Yes, the model itself is a huge ensemble of small parts, but they all work together to satisfy exactly one demand: given N previous tokens, predict the next one. The ability to complete this task is the emergent property you'd expect to be downstream from numerical data flowing through the model's layers.

But couldn't something extra still emerge out of repeated token prediction?

In a sense, any text is greater than the sum of its parts: the meaning of each word is modified by the entire context surrounding it, so the meaning of the whole text is beyond a concatenation of isolated word meanings. Nonetheless, for a text generated by token prediction, each step is an independent process in principle - it just follows a trajectory implicit in the intermediate results. This piecemeal process of construction can be analyzed as such, to yield valid conclusions about the entire result.

As we'll see, token prediction imposes inherent qualitative constraints, which can't be bypassed by the magic of emergence. To get there, we'll examine what emerges on a level higher than that of individual texts: what if we organize all of the model's possible outputs into a single structure? Ironically, the "reductive" formalism is the perfect mathematical tool for this job.

But first, I turn to Borges' Library of Babel for inspiration

The story describes a library composed of endless hexagonal rooms. Each room houses shelves of books containing every possible arrangement of letters, spaces, and punctuation marks.

Consequently, the library contains not only all meaningful works - every truth, lie, prophecy, scientific breakthrough, or masterpiece - but also endless volumes of unintelligible nonsense.

The library's inhabitants - its librarians - wander tirelessly through the infinite corridors, driven by an insatiable quest for meaning and enlightenment. Yet the library's infinitude renders every meaningful text effectively impossible to find, buried amidst infinite nonsense. Worse yet, the librarians are haunted by the impossibility of distinguishing truth from falsehood, as every conceivable statement is equally present.

Different sects form among them: some search desperately for a mythical "catalog" of the library's contents, others destroy books deemed useless, and yet others resign themselves to nihilistic despair - religious faith, fundamentalism, or nihilism, all born from a yearning for certainty.

Thank ChatGPT for that summary. Or better yet, read it for yourself.

It's a poignant story, but what does it have to do with language models?

For a start, think of Borges' Library as the set of outputs from a primitive language model: one that predicts characters instead of tokens, but with zero regard for accuracy. For any given piece of text, it predicts any next character to be as likely as any other.

Naturally, such a model suggests no organizing principles for its outputs - at least none that help tell apart data from noise. Borges takes artistic liberty with the hexagonal layout, but the whole library represents a uniformly random generative process.

Now consider an analogous library based on tokens instead of characters: it would contain all possible texts up to some maximum number of tokens, representing outputs from an "indiscriminate" LLM. This new library would have essentially the same properties as the original. However, it would also coincide with the set of possible outputs for a normal LLM, assuming even nonsensical completions are represented by tiny but nonzero probabilities. So how can we organize the Modern Library of Babel to reflect the order and structure of a useful language model?

Of course - perplexity

Formally, perplexity is defined as the exponentiated average negative log-likelihood of a sequence:

PP(X) = exp(-1/N ∑ log P(xᵢ|x₁,…,xᵢ₋₁))

Less formally, we treat a given text as a sequence of outcomes from a random process (not necessarily uniform). The outcomes in our case are the text's actual tokens.

Each token is assigned a probability, according to the model, based on the preceding tokens: it reflects how well each follow-up conforms to the training data.

The probabilities are then aggregated in a way that allows for a useful average to be computed for the entire text.

Finally, that gets mapped into an intuitive value reflecting (roughly) the average number of viable continuations at each step. This key metric follows naturally from the "token predictor" definition and makes direct use of its terms.

Perplexity values range from 1 to infinity. 1 means the text is so predictable that every step leads inevitably into the next - something so typical of the training data that its completion is like rote recall. High values imply a branching labyrinth of possibilities: the model isn't sure where the text is going, and the higher number of possible continuations means a lower confidence in each.

Now, imagine a structure of concentric circles

The circles are borders that divide a flat, endless expanse into rings. Each border is assigned a perplexity value, starting with 1 for the circular center and gradually increasing as you go outwards. The areas between adjacent circles form perplexity bands: each ring houses all the texts whose perplexity values fall in the range defined by its inner and outer borders.

This is the Modern Library of Babel.

This new library sorts all the model's possible outputs in a way that faithfully reflects what the model has learned. To find a potential response for a given prompt, you only need to pick a band, find in it a text starting with that prompt, and read on.

Sounds easy enough, but which band to pick?

As LLM designers well know, most of the texts users want aren't in the Library's center, even though that's where you're most likely to find true statements (or at least, statements that perfectly line up with conventional knowledge).

The cost of reliability is extreme conservatism. Consider what happens if a user wants the solution to a difficult or open problem: the Library's center can offer general pointers, or simply state that there's no known solution; it contains safe responses that strictly conform to the training data. But for "risky" attempts at novel solutions, look elsewhere.

In practice, when you're prompting your favorite chatbot, you're exploring some bands around the center, where you can find some "creative" outputs. Since the token predictor design provides no inherent way to decouple correctness from creativity, the choice of bands is a balancing act. Of course, some products come with a "temperature" control that lets you explore further out; but it's aptly named, because when you crank it up, the poor AI starts to suffer from fever delirium.

If we can extrapolate slightly from empirical observations, and follow the implications of the perplexity metric, we should conclude that truly innovative texts would be far from the center: Einstein's unwritten theories or valid proofs of the Riemann hypothesis are inherently unlikely texts.

So why not look further out?

Well, based on the definition of perplexity, the number of potential responses to a prompt explodes exponentially the further out you go. Meanwhile, the number of correct responses remains constant, with only so many different ways to state them. The ratio of useful to nonsensical texts rapidly plummets. In other words, while the probability for at least some texts to contain new and groundbreaking works increases, the probability of stumbling upon them vanishes.

If you look at a narrow perplexity band, you're looking at texts that are (more or less) equally "good" by the model - equally likely to be generated, regardless of their sensibility - so long as they match the prompt. But the outer bands will contain texts that contradict each other on the same subject matter, or the very same prompt: those bands accommodate setting out on a plethora of very different token-stringing paths.

In short, the outer reaches of our modernized Library of Babel actually approximate the situation from Borges' classic.

Maybe you're starting to see how this ties back to the original intuition: you can think of a text as a path traced by the generative process - a record of token choices (ha-ha) made at each step. But starting from a given prefix (like a prompt), the model has no inherent preference for any particular path out of the variety on offer in the relevant perplexity band. The "choices" are made by a sampling algorithm, driven by a pseudo-random number generator.

This substantiates what it means to say that the model doesn't understand what it's talking about: it doesn't know or care how the different paths it could take compare to each other, with respect to truth, utility, or any external factor - it's not a reasoned judge. In Borges' terms, the LLM's core is not a librarian, but the Library itself.

Nonetheless, this Modern Library of Babel does have its librarians: they're the avid users. Some of them explore the outskirts for hidden gems of truth. Others search for something like the Catalogue from Borges' story: if only they could find the right prompt, they'd get a reliable answer to any question. A few even look for a secret incantation to awaken the Library to its own dormant consciousness.

Like Borges' librarians, they wander through endless corridors, noses stuck in endless volumes, without grasping the bigger picture of this pursuit - its true emergent properties.

But can't you turn the same argument back against humans?

Why not hypothesize a Human Library of Babel? Just organize all the potential outputs of the human mind based on a human measure of perplexity. Wouldn't that reveal an analogous situation?

I don't think so.

For starters, human "perplexity" works differently: If I'm reading a ground-breaking work, it can come off as confusing and dubious at first. Provisionally, I might assign it a perplexity value as high as gibberish - it wouldn't be so novel if it didn't start from unusual premises or set off on an unlikely path. But by the end of it, the work will have justified itself. Understood as a whole, it could be well in line with known facts and valid reasoning. In the final count, I could end up assigning it a perplexity value as low as that of accepted theories.

Such a dramatic re-evaluation of previous tokens, with its abrupt change in perplexity, doesn't happen with normal transformer-based LLMs: an unusual text can eventually settle into relative predictability, given enough context, but the average will remain tainted.

More significantly, with token predictors, there's a symmetry between generating and evaluating texts. But this is not the case with humans: Through deep insight, the mind can sense connections between ideas that seem distant and unrelated. To validate and communicate such an insight, you'd need to mediate between the ideas explicitly - to build up an argument with a linear progression that other minds can follow. This process can be slow and arduous, but guided by the initial flash of understanding that you can't expect the reader to have.

That's a significant contrast. When a LLM generates a text, it follows a path that unfolds before it to an unknown destination. The stepping stones - the succession of generated tokens - aren't mediators, but guides. It's exactly as "lost" trying to construct a novel text, as the perplexity evaluation of the result implies. i.e. most of the way through, it's at the edge of its seat, guessing like a clueless reader!

Still, the imagery of a core with concentric circles can be useful even for humans

Let the core be the Earth: ground truth. Around it are bands of humanity's intellectual atmosphere, gradually thinning as it reaches into a vast and unexplored space. Some ideas lay right on the ground, like obvious conclusions, easy to stumble into. A bunch are up in the air, over our heads, waiting for an intuitive leap. Most are up in the sky, or all the way in space- unproven theories, baseless fantasies, outright delusions...

But you can also imagine a mountain: it starts from the ground and goes way up. The intellectual adventurer sees its outline from afar and plots a rough course for the peak. When he starts to climb, the mountain poses many challenges he didn't foresee - such that can only be solved from up close. Eventually, he reaches the peak and sees the world from a new vantage point. He stands where even the air is thin, but he's connected to the ground through a sheer mountain. From there, he can bring new insights back to ground, along with the story of how he got to them: a story too fantastical at the outset, but undeniably true by the end.