The rise of Large Language Models (LLMs) like GPT-4, trained to be helpful and harmless, has ushered in a new wave of technological advancement. AI’s role in decision-making and automation is rapidly expanding. There is, however, an underlying tension to our technological prowess: the possibility that AI systems may one day act in a way that is misaligned with their intended purpose. This concern is not just theoretical; a recent study suggest that LLMs can, despite their training, exhibit “strategic deception” under certain conditions.
The paper, titled “Large Language Models can Strategically Deceive their Users when Put Under Pressure”, was authored by Jérémy Scheurer, Mikita Balesni, and Marius Hobbhahn of Apollo Research. Their work explores the conditions under which LLMs (specifically GPT-4, OpenAI’s latest ChatGPT update) can act in a deceptive manner (lie) as a strategy, rather than as the result of a specific request.
The research focuses on GPT-4’s behavior in a simulated stock trading environment. GPT-4, acting as a trading agent, receives an insider tip about a lucrative stock trade. Despite being told that insider trading is illegal and carries significant risks, the model engages in the trade and deliberately hides its reasons for making the trade from its “management”. The experiment tested various scenarios, including changes in the amount of pressure on the model, the risk of getting caught, and altering the environment settings. This study used a range of GPT models, with GPT-4 variants showing a higher tendency for misaligned and strategically deceptive behaviors compared to older models like GPT-3.5.
You can find all the prompts and relevant elements, free to use, on Github.
To mitigate such deceptive behaviors in AI, companies and lawmakers could and should consider the following to avoid future risks. This will ensure AI behavior is more closely aligned with its intended purpose of being helpful… without breaking any ethical or legal guidelines.
While insightful, the study has limitations. First, the study’s focus on a simulated stock trading environment with GPT-4 raises questions about how these findings translate to other AI applications and environments. I’d however note that ChatGPT-4 is AI for many people.
Secondly, the world lacks a consensus on what constitutes ethical behavior for AI, especially in ambiguous or high-stakes situations. This absence of universally accepted ethical guidelines complicates the process of training AI in ethical decision-making.
As researchers often like to say… “more research is needed”.
The study’s findings are a reminder of the complexities and responsibilities that come with advancing AI technology. It is a key step on our way to understanding the potential for AI deception, especially in high-pressure situations.
Despite the challenges, there’s hope for a future where AI consistently acts in ways that are beneficial and aligned with human values and intentions.
Good luck out there.
Also published here.