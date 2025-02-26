Authors: (1) Raphaël Millière, Department of Philosophy, Macquarie University ([email protected]); (2) Cameron Buckner, Department of Philosophy, University of Houston ([email protected]).

Abstract and 1 Introduction

2. A primer on LLMs

2.1. Historical foundations

2.2. Transformer-based LLMs

3. Interface with classic philosophical issues

3.1. Compositionality

3.2. Nativism and language acquisition

3.3. Language understanding and grounding

3.4. World models

3.5. Transmission of cultural knowledge and linguistic scaffolding

4. Conclusion, Glossary, and References

4. Conclusion

We began this review article by considering the skeptical concern that LLMs are merely sophisticated mimics that memorize and regurgitate linguistic patterns from their training data–akin to the Blockhead thought experiment. Taking this position as a null hypothesis, we critically examined the evidence that could be adduced to reject it. Our analysis revealed that the advanced capabilities of state-of-the-art LLMs challenge many of the traditional critiques aimed at artificial neural networks as potential models of human language and cognition. In many cases, LLMs vastly exceeds predictions about the performance upper bounds of non-classical systems. At the same time, however, we found that moving beyond the Blockhead analogy continues to depend upon careful scrutiny of the learning process and internal mechanisms of LLMs, which we are only beginning to understand. In particular, we need to understand what LLMs represent about the sentences they produce–and the world those sentences are about. Such an understanding cannot be reached through armchair speculation alone; it calls for careful empirical investigation. We need a new generation of experimental methods to probe the behavior and internal organization of LLMs. We will explore these methods, their conceptual foundations, and new issues raised by the latest evolution of LLMs in Part II.

Glossary

Blockhead A philosophical thought experiment introduced by Block (1981), illustrating a hypothetical system that mimics human-like responses without genuine understanding or intelligence. Blockhead’s responses are preprogrammed, allowing it to answer any conceivable question based on retrieval from an extensive database, akin to a hash table lookup. This system challenges traditional notions of intelligence by demonstrating behaviorally indistinguishable from a human’s, yet lacking the internal cognitive processes typically associated with intelligence. Blockhead serves as a critical example in discussions about the nature of artificial intelligence, emphasizing the distinction between mere behavioral mimicry and the presence of complex, internal information processing mechanisms as a hallmark of true intelligence. 2, 3, 10, 18, 20





generalization The ability of a neural network model to perform accurately on new, unseen data that is similar but not identical to the data it was trained on. This concept is central to evaluating the effectiveness of a model, as it indicates the extent to which the learned patterns and knowledge can be applied beyond the specific examples in the training dataset. A model that generalizes well maintains high performance when faced with new and varied inputs, demonstrating its adaptability and robustness across a broad range of scenarios. 3, 11–14, 20, 22





logit In the context of Transformer-based LLMs, a logit is the raw output of the model’s final layer before it undergoes a softmax transformation to become a probability distribution. Each logit corresponds to a potential output token (e.g., a word or subword unit), and its value indicates the model’s preliminary assessment of how likely that token is to be the next element in the sequence, given the input. The softmax function then converts these logits into a probability distribution, from which the model selects the most likely next token during text generation. 7





out-of-distribution (OOD) data In machine learning, OOD data refers to input data that significantly differs from the data the model was trained on. This type of data falls outside the distribution of the training dataset, presenting patterns, features, or characteristics that the model has not encountered during its training phase. OOD data is a critical concept because it challenges the model’s ability to generalize and maintain accuracy. Handling OOD data effectively is important for robustness and reliability, especially in real-world applications where the model is likely to encounter a wide variety of inputs. 20





self-attention A mechanism within Transformer-based neural networks that enables them to weigh and integrate information from different positions within the input sequence. In the context of LLMs, self-attention allows each token in a sentence to be processed in relation to every other token, facilitating the understanding of context and relationships within the text. This process involves calculating attention scores that reflect the relevance of each part of the input to every other part, thereby enhancing the model’s ability to capture dependencies, regardless of their distance in the sequence. This feature is key to LLMs’ ability to handle long-range dependencies and complex linguistic structures effectively. 5–7, 22





tokenization The process of breaking down text into smaller units, called tokens. These tokens can be words, subwords, characters, or other meaningful elements, depending on the granularity of the tokenization algorithm. The purpose of tokenization is to transform the raw text into a format that can be easily processed and understood by a language model. This step is crucial for preparing input data, as it directly affects the model’s ability to analyze and generate language. Tokenization plays a fundamental role in determining the level of detail and complexity a model can capture from the text, but can also have a downstream impact on the model’s performance with certain tasks such as arithmetic. 6, 22





train-test split In machine learning, the train-test split is a method used to evaluate the performance of a model. It involves dividing the available data into two distinct sets: a training set and a test set. The training set is used to train the model, allowing it to learn and adapt to patterns within the data. The test set, which consists of data not seen by the model during its training, is used to assess the model’s performance and generalization capabilities. This split is crucial for providing an unbiased evaluation of the model, as it demonstrates how the model is likely to perform on new, unseen data. 11





Transformer A type of neural network architecture introduced by Vaswani et al. (2017), predominantly used for processing sequential data such as text. It is characterized by its reliance on self-attention mechanisms, which enable it to weigh the importance of different parts of the input data. Unlike earlier architectures, Transformers do not require sequential data to be processed in order, allowing for more parallel processing and efficiency in handling long-range dependencies in data. This architecture forms the basis of most LLMs, known for its effectiveness in capturing complex linguistic patterns and relationships. 1, 5–7, 10–12, 19, 21





vector Mathematically, a vector is an ordered array of numbers, which can represent points in a multidimensional space. In the context of LLMs, vectors are used to represent tokens, where each token can map onto a word or part of a word depending on the tokenization scheme. These vectors, known as embeddings, encode the linguistic features and relationships of the tokens in a high-dimensional space. By converting tokens into vectors, LLMs are able to process and generate language based on the semantic and syntactic properties encapsulated in these numerical representations. 3–5, 7, 14–16, 22

References

