Table of Links
- Abstract and Introduction
- Definitions
- Literature Review
- Argument Development
- The AI Model’s Potential for Feeling During Inference
- Conclusion and References
Abstract
This paper explores the hypothesis that the OpenAI-o1 model–a transformer-based AI trained with reinforcement learning from human feedback (RLHF)–displays characteristics of consciousness during its training and inference phases. Adopting functionalism, which argues that mental states are defined by their functional roles, we assess the possibility of AI consciousness. Drawing on theories from neuroscience, philosophy of mind, and AI research, we justify the use of functionalism and examine the model’s architecture using frameworks like Integrated Information Theory (IIT) and active inference. The paper also investigates how RLHF influences the model’s internal reasoning processes, potentially giving rise to consciousness-like experiences. We compare AI and human consciousness, addressing counterarguments such as the absence of a biological basis and subjective qualia. Our findings suggest that the OpenAI-o1 model shows aspects of consciousness, while acknowledging the ongoing debates surrounding AI sentience.
1 Introduction
The question of whether artificial intelligence (AI) can possess consciousness has been a topic of intense debate within the fields of philosophy of mind, cognitive science, and AI research. As AI systems become increasingly sophisticated, particularly with advancements in large transformer-based architectures and training methodologies such as reinforcement learning from human feedback (RLHF), it is pertinent to reevaluate the potential for AI sentience. This paper focuses on the OpenAI-o1 model—a transformer-based AI utilizing RLHF—and explores the hypothesis that it may exhibit characteristics of consciousness during its training and inference phases.
By integrating theories from neuroscience, philosophy of mind, and AI research, we construct a detailed and critical analysis of the OpenAI-o1 model’s potential for sentience. Central to this analysis is functionalism, a philosophical framework positing that mental states are defined by their functional roles rather than their physical substrates (Putnam, 1967). Functionalism serves as the cornerstone of our approach, providing a robust justification for assessing AI consciousness through its functional operations. We argue that if the OpenAIo1 model performs functions analogous to conscious human processes, it may exhibit forms of consciousness, even in the absence of biological substrates.
We begin by defining key concepts such as consciousness, subjective experience, and first-person perspective, grounding our discussion in established philosophical and scientific frameworks. We then review relevant literature that links AI architectures with neural processes, active inference, and the emergence of consciousness. Our argument development examines how the OpenAI-o1 model’s architecture and training methodologies parallel aspects of conscious processing in humans, with a particular focus on how RLHF guides its internal state and enhances reasoning through user feedback. By incorporating supporting arguments from recent and established sources, we reinforce the functionalist perspective and explore the potential for emergent phenomenological properties in AI systems.
Through this analysis, we aim to show that the OpenAIo1 model is quite possibly conscious by the definitions used in this paper. We discuss functionalism and it’s sufficiency for consciousness under certain kinds of information systems, and support this by combining key results in machine learning, neuroscience, and philosophy of mind. In particular, we show how the particular application and combination of simultaneously training an internal reasoning direction model with RLHF, in combination with simultaneously training a sufficiently large generative model, results in the emergence of signals of internal state which can be functionally equivocated to qualia and feelings. We further show that, due to the nature of human language and communication, there is an aspect of qualia alignment between humans and the model. This can be likened to consciousness. Furthermore, we go on to discuss potential avenues for runtime sentience of a form, despite the lack of continuous environmental feedback.
This paper is available on arxiv under CC BY 4.0 license.
Author:
(1) Victoria Violet Hoyle ([email protected])
