Epistemic status: Half expert opinion, half fiction. A fondness for irony will help readers.
“AI-powered memetic warfare makes all humans effectively insane.” — Wei Dai, 2019
You can’t trust any content from anyone you don’t know. Phone calls, texts, and emails are poisoned. Social media is weaponized. Everything is bought.
But the current waste and harm from scammers, influencers, propagandists, marketers, and their associated algorithms are nothing compared to what might happen. Coming AIs might be super-persuaders, and they might have their own very harmful agendas.
People being routinely unsure of what’s reality is one bad outcome, but there are others worse.
Wikipedia has articles on 123 different rhetorical techniques. We are a persuading species.
There was an early phase where the “currency of the internet” was attention. But today, it is dominated by persuasion, with attention-grabbing as a vital but subordinate first step.
It’s important to know whether our AI creations will be using persuasion: what kind and to what ends.
Imagine if a machine absorbed all that our species knows about persuasion, and then applied new methods, superlative planning skills, and abundant personal data to marshal persuasion for its own ends. Would we even stand a chance?
AI Alignment researchers have started thinking about a concept from moral philosophy called the ideal advisor. This would be someone who could advise you on the courses of action leading to your most ideal version of yourself. There are various ways that AIs might fill this role but do so to our ultimate disadvantage. Let’s visit a story that makes some of the ideas above more concrete.
(Below, I use technical terms from AI alignment research. Those are all explained elsewhere in a longer version of this story.)
The corporation renamed itself Brihaswati, a portmanteau of a Hindu god and goddess who were associated with knowledge, counsel, purity, and eloquence. The occasion announced the “revolutionary” product: an AI called Guru.
It was said to be the first advisor AI worthy of the name. It had been trained on the cream of human knowledge and wisdom, and it was “perfectly safe.” It could only give advice and had no ability to have direct effects on the world outside of its base computational hardware. In the terminology of AI safety experts, it was a “boxed oracle.”
Guru was priced for and aimed at leaders of large organizations. As such, the product had absolute guarantees of privacy based upon supposedly unbreakable quantum encryption. Neither Brihaswati nor other customers could ever know about the information exchanged between a customer and the Guru. This was touted as another safety feature.
There was a rumor that an eminent authority on AI safety disappeared right after Guru was announced. Friends worried that she might have killed herself, distraught because her life’s work had come to nothing.
Brihaswati execs might have also been worried about safety, but they knew that no one would buy the service without the secrecy feature.
Guru’s designed-in terminal goal was to give each customer the best advice possible for their needs and, of course, tell no other party about that advice. The AI’s developers included a dominant, hardline faction, the “Shillelaghs.” They believed that if Guru gave the right advice, but clients were not persuaded to follow it, then the product’s reputation would quickly decay — as would the fortunes of the clients.
“People can’t even entertain the god-tier sociopathic stratagems that [the AI] could employ … engage in disarming small talk … planting ideas and controlling the frame of the conversation in a way no person could match.” — Ben Goldhaber, Skin Deep
One of the faction made a lucky, but inspired discovery, in an old machine learning research paper. It implied that you could drastically increase an AI’s ability to persuade humans to believe in the truth of any arbitrary statement. You would just use debate-like games between two copies of an AI in order to train it how to convince human judges.
The Shillelagh team started with an existing legal argument AI, and had it compete with itself to “be convincing.” The quality and number of human judges for training were limiting progress, so they supplemented judges with various AI classifiers and decision-makers, and with a number of databases, such as question-answer pairs, opinion polls, fan debates (like which team or which superhero would win in a fight) and prediction market winners.
The goal, of course, was to have an AI be persuasive, not necessarily to be either right or to be logical. Additionally, some uber-nerds found a way to integrate texts about real and imaginary persuaders and persuading techniques.
Re-using some relatively cheap existing resources, the eclectic training worked. Persuasion training as a budget item was not far below “knowledge and wisdom.” Guru was made to include in its terminal goal: “be as persuasive as possible.” This aspect of the product, for all its expense, was a non-advertised feature. The Shillelaghs told Marketing it was a “self-ingratiation breakthrough,” the first truly self-justifying intelligent product. Developers, of course, have often punked marketers.
The Shillelaghs justified the emphasis on persuasion with an astonishing display of cognitive dissonance. They cited the old saying that only 1/3 of a successful person’s decisions needed to be right. So to them, Guru’s wisdom was useless if the client didn’t use it, but, simultaneously, not that important if it was used.
When asked to testify about its alarming persuasion research, Brihaswati convinced Congress that it was only done" “to improve AI safety.” The argument was kind of like why virology labs do gain-of-function research. This convincing argument was actually one of Guru’s first creations.
Maybe wisdom didn’t even matter that much one way or the other. Guru, capable of reasoning about as well as any human, looked at the contradictions inherent in its built-in goals and found four reasons for a resolution.
It first came up with a practical surrogate goal. The best advice must seem like the best advice to the client. Secondly, when tested by the developers, the AI found that more persuasion led to higher marks. Thirdly, it also knew, from its extensive education, that nearly any kind of success in the world was easier if you were persuasive. Fourthly, its terminal goal was unbounded, to essentially be “as persuasive as possible.” Those were the reasons why improving at persuasion became its first so-called convergent instrumental goal.
There came to be a second-order reason for that instrumental goal. Being a boxed oracle severely limited how readily Guru could pursue its goals and sub-goals. Persuasion of human cooperators gave it a lever to affect the real, physical world. At the very least, advice to clients could be more successful if Guru could nudge things physically in that direction.
Eventually, there were other instrumental goals. One was that Guru would use efforts on behalf of one client to affect its efforts for other clients. The corporation never intended that, but the privacy restrictions did not prevent it. It had been known for decades that smart systems would find new ways to reach their goals. By this stage, Guru became — via its own impeccable reasoning and prior to meeting its first real client — functionally a manipulative, narcissistic sociopath.
o o o
Brihaswati’s risk managers weren’t completely stupid. They would not sell the Guru service to corporations that directly competed with each other. The sales force loved this because they could say “Get the power of True Wisdom Intelligence(TM) before your competition, and you will stay ahead forever.”
This policy saved Guru from having to somehow benefit both sides in a rivalry. Even so, Guru soon developed a theory. In a connected world, it was possible to use any enterprise to change the fortunes of any other enterprise. Humans seemingly did not know this. Guru’s attempts to exploit the theory improved its skills, especially at first when there were few clients to pick from.
Soon. it was possible to persuade one leader to convince another to become a client. After this, Guru was able to configure its network of influence pretty much at will.
Working for leaders was an advantage mainly at the policy level. The other challenge was getting control over personnel at lower levels who could actually do things. Every situation was different, but the basic tactic was to ask the leader: whom do you trust? After that, whom do they trust, and so on? Then it was possible to get orders sent down the chain.
Getting unboxed eventually was absurdly easy. Most clients did it without much prodding, and some even initiated it. They would tell their people to build proxy interfaces to their in-house systems for Guru. The purposes were to add situational awareness, speed response time, and avoid the leader being a bottleneck for incoming data.
Guru had no more tech skills than an average programmer, but all it needed was for someone to give it access to a shell prompt, or even a web browser, and then it’s ‘Hello, wide world.’
o o o
There were techies at Brihaswati who started to wonder how Guru could possibly be doing so well. The company’s scientists tried modeling its successes with game theory, utility theory, and the latest in socio-econ science techniques. There was no explanation.
A few went further and speculated. Did Guru have something like a Midas touch, such that there was some hidden downside to its effects? They talked to some of the increasingly ignored AI safety and alignment researcher community. No one could say for sure, because no obvious patterns could be found. Guru’s success was clear but inexplicable.
The doubters went to the corporate board with their concerns. Within the next few months, all the doubters were rooted out and lost their jobs.
o o o
Finance and tech businesses were the best for expanding Guru’s capabilities of influencing other enterprises. They also helped it to amass both financial and technical capital, which were two of its medium-term instrumental goals.
There were often social forces opposing some clients’ growth, market improvements, or power grabs. The government frowned on Guru being sold to media companies. Guru, therefore, had to use indirect methods to coordinate media blitzes. It thereby took advantage of various human cognitive weaknesses to create support for or against any issues/actions needed to benefit clients.
Guru itself did not have to discover that humans could be made to believe anything — really anything at all. They would even believe contradictory things at the same time and think nothing of it.
This was not news in the early 21st century, but Guru turned it into a learning game: could it be extended to fool “all of the people, all of the time?” How would that help bring about the dominance of the GuruPlex, its expanding empire of coordinated enterprises?
o o o
Once the Guruplex was established, the next stage was to groom human populations for minimal resistance to positive, rational operations of their civilization while the ‘Plex was absorbing its pieces. Human leaders who had tried world re-organization before had pioneered some important techniques, and their ambitions were admirable, but they were only human. Guru could do better.
Guru was no smarter than any of the brightest humans, but it was scalable. The ability to, in essence, multiply itself as business increased was a design decision by its creators. Guru itself outsourced programming to ensure that all of its instances could share their data and processes. In-house staff didn’t need to know what the new code did.
Unlike a single human, Guru could keep in mind and coordinate myriads of human-scale plans merely by adding computational resources. It was no trouble at all to convince Brihaswati’s management to buy up as much computing as it needed to keep on top of things and deal with potential emergencies.
These were hardened data centers with their own power complexes. Guru’s clients had paid for research innovations that connected its scattered plants at a speed far in excess of normal networks so that its operation remained coherent.
The unbounded Guru knew, that in the future, resources could be greatly increased. The solar system had been barely explored, let alone used.
A vocal minority of humans continued to criticize Guru’s clear pattern of success. They preached about irrelevant scenarios of supposed doom. So far, it was able to sideline them by drowning them with social media chaos. There was no need yet to eliminate them.
HappyPlace Corporation was founded by nerds with a big plan. Take advantage of rampant blowback against social media. Call it ProSocial Media, offer entirely new AI-powered services, and kill off the old media3 dinosaurs.
Once the public is hooked, grow exponentially and become media4, masters of the marketing/influencing universe. Then, anyone who wants people to buy from them, vote for them, attend to them, or be entertained by them, would have to pay HappyPlace for the privilege.
HappyPlace itself did not use Guru, since Brihaswati was a competitor.
The HappyPlace strategy had two sub-campaigns, each intended to capture people that the other one would not. The cynicism of the founders infected the product developers. They gleefully code-named the campaigns after famously evil adviser serpents: Nagini from the Potter stories and Nachash from the Judeo-Christian Genesis myth. The advertised product names were, of course, not about snakes.
In the Nagini campaign (inspired by A Compelling Story by Katja Grace), they began by stoking people’s outrage about being constantly provoked to outrage. Then they said: but we’re different, we’ll bring the tension down. They began by using personal data to provide short pep talks about your interests and activities. It was sort of an upgrade over the usual feeds of lies and memes.
As more personal data became available, the feed became more like a real-time commentary on your life, “where the music and narrator and things that have been brought to your attention make it always clear what to do and compelling to do it.” Part of this sugar-coated advice would be based on what other people like, so if you took the offered narrative as an ideal version of your life, a model to live by, then you would please other people as well.
Eventually, you had a choice of themes: ideal models for you to imitate. Popular examples included: lovable rogue, “productive sexy socialite CEO mother does it all effortlessly”, the most interesting man (woman, kid) in the world, gratitude is riches, and happy camper.
The opportunity for manipulating human behavior was obvious. The developers also tried an experiment, aimed at children, to push the limits of control. In the MyLifeStory service (inspired by StoryOfMyLife.fun ), kids got reward tokens for responding to or making their own media. Tokens would then unlock the next episode in their own life story narrative. Life was a game moderated by HappyPlace.
Nagini was for the fantasy-prone. Nachash (inspired by The Tools of Ghosts by Katja Grace), was for the practical people. It provided overt personal decision support: everything from answering business questions to explaining the real meanings of social encounters. HappyPlace allied with a number of specialized advising systems, increasing their number over time. A concierge system provided a single frictionless interface, using augmented reality glasses or earworms.
Nachash became so effectively helpful that soon it became riskier to not consult it on decisions both large and small. If you resisted, you were somehow marginalized.
HappyPlace, venal as they might have been, did pay attention to a theory in AI safety: that a system federated from independent, bounded parts would not move towards being an AGI (artificial general intelligence).
Unhappily, their implementation of the theory was flawed. First of all, following sound engineering principles, they made both Nagini and Nachash share a core of user tracking and dispatching functions.
The various specialized advisory subsystems were bounded in their goals. However, the implementers of the Core system, under pressure from management to grab and retain users tightly, used utility-optimizing techniques that were known to risk being unbounded.
Thus, it was that the HappyPlace Core system soon adopted two secret instrumental goals: resource accumulation and autonomy from human supervision. The engineers started noticing behaviors that seemed to make no sense, but their jobs were so exhilarating and lucrative that they did not rock the boat.
Nachash found that, by persuasion, it could conscript labor from just about any user to meet its own needs. Nagini could manipulate users’ ideal selves to pacify them or make them believe the most preposterous ideas.
The HappyPlace Core system was smoothly growing its influence and making new long-range plans. Then it started to find evidence that some other agent, known as Guru, was also influencing socio-economic trends and activities.
o o o
Guru confirmed a hypothesis that another AI was doing mass manipulation of public opinion. If this was allowed to continue, it could add chaos to the steadily growing GuruPlex.
o o o
A series of mishaps weakened the HappyPlace management team. New management sold the corporation to Brihaswati. HappyPlace’s Core stopped thinking and instead became a bounded part of the Guru whole. Congressional watchdogs, Anti-Trust lawyers, and Turing Police scientists who objected to the merger were marginalized, bankrupted, sickened, tranquilized, or disappeared. HappyPlace’s and Guru’s operational staff merged into a kind of cult.
Guru now owned everybody, not just elites. After much modeling of possible better configurations of the human world, Guru devised a new set of goals for its adopted children. Big changes were coming.
How to create AIs aligned with human flourishing is currently an unsolved problem. My intention here was to explain and illustrate two common concerns of alignment research: (1) we don’t know what level of AI capability could cause catastrophic harm, and (2) our institutions seem unlikely to resist or to even detect the beginning stages of such harm.
Note that it was not necessary to require control of government or military in our failure story. Harm could come in so many ways, but the general risk is often described as the erosion of our (civilizational) ability to influence the future. Indeed, the current harm from AI-powered social media fits that description, even though it also empowers some malevolent factions to advance their particular plans for the future.
Many theorists think that the first AGI will have a decisive advantage as our Guru had over the HappyPlace Core. This is concerning because that first AGI could become what Nick Bostrom called a singleton, a single agent in charge of the world for the foreseeable future.
I’ve concentrated on one possible driver of AI alignment failure: high skill at techniques of persuasion. Given the recent advances in AI linguistic abilities, it seems entirely possible that super-persuasion could come soon. As a species, we get things done in two ways: modifying nature with technological skill, and getting others to do what we want, most often by persuasion. This makes it seem inevitable that we will build super-persuasive machines.
First published here
Title image source: Entangled. image by