There have been many kinds of books, with many kinds of meanings. This one book was special because it was the first fictional story produced via artificial intelligence. It was the first book in the sense that its contents made sense. Before this book, all other attempts of letting an AI write a book had produced things that were pastiches of randomness. A couples of sentences here and there surrounded by text that made no sense. Yes, many of those works were sold at ridiculous prices at literary circles in New York, or acquired by Chinese Millionaires that displayed them at their 80th floor penthouses in Beijing.
Still, they were unquestionably garbage.
This other book I’m talking about wasn’t special per se, the story was the typical plot with a couple where each lover has a background which makes it socially impossible for them to be together, a synthesized Romeo and Juliet of bits; silicon love. It also had a hero embarking on an unsurmountable quest; it even had the final climax of the hero fighting against an evil opponent that was bigger than life.
So, yes, the plot made sense, but there was nothing special about it, no Nobel Prize for Deep Writer (DW or Dewey as television pundits started to call it). The challenge that made the book interesting began when, at several internet forums, literary aficionados tried to reverse engineer the story. They wanted to find which books had influenced Dewey. This problem was the first schism in the quest for trying to reverse-engineer this book.
If we want to know how a program works, we try to obtain it source code, by whatever means possible, whether it’s by decompiling the program, or — if luck it’s on our side — by reading it online from its open source repository. Of course not everyone agreed with this method. Some authors posited that the book being a literary object, we should analyze it solely by reading its content. There’s no need to know who the author is, how do they think, what are their likes or prejudices. We should be able to grab the book, read it, and try to find meaning from there alone.
There was a second group of people that argued that understanding the author’s creative process and their life context was crucial to understand the book itself. The author had to have prejudices, fears, and joys that led them to write what they wrote. They lived inside a society with its assumed allowances and prohibitions. In this case the author was an algorithm, so its feelings and prejudices had to be in Deweys source code itself. Also the prejudices of Deweys creators had to have played a part on the creation process as well. What did they left encoded here and there that conditioned Dewey’s “creativity”?
Researchers went and dissected it trying to find clues about the inner workings of Dewey. This method produced the most interesting results.
First, the algorithm was trained with the material available at Project Gutenberg, then refined with texts from Google Scholar and finally with online resources from several other online archives.
To rank the texts, authors who had won literary prizes had their works bumped up in the rankings so the algorithm would favor their text and their ways of writing over other less literary successful authors. The second factor used to rank input texts was how well they fared on Amazon sales (!). The algorithm even used a metric called Highlights Per Sentence Ratio (HPSR). That metric worked by simply taking into account how many highlights each book had on their Kindle counterparts — let’s talk about arbitrary metrics!. The last one was an attempt at extracting some sort of scores out of Goodreads’ reviews.
Then the book had plenty of filters trying to separate the chaff from the wheat. One of them was very explicit: don’t feed the algorithm with religious texts, like the Bible. This ended up being interesting, because even though the Bible wasn’t used as a source of inspiration for Dewey, the rest of the corpus was heavily influenced by Christian culture — this corpus being based on mostly western literature, wasn’t at all surprising — . To name a couple of funny instances, Dewey’s characters said 257 times across the book the expression “Jesus!”, and another 147 times they cried “Oh God” — without taking into account euphemisms like Gosh — . Consider that religion wasn’t meant to be found inside the text, this ended up being quite ironic.
Some other aspects of the book went from ridiculous to sad without much stop in between: most characters in the book were white people. The “exotic” characters had names that almost never occur in their cultures, like a latino salesman that appeared on chapter seven, who was called Rodolfo Airondo Buñuelos (Buñuelos is a fried specialty that’s cooked along Latin America, not a family name, and let’s not even talk about the middle name Airondo). And what can we say about the treatment of women characters in Dewey’s book? The main female character in the book is called Bella. She was blond, fair, smiling, and always speaking with a soft voice. If Cliché would have been her family name, I’d guess nobody would have noticed. Her friend was just there to act as a way to augment Bella’s characteristics. The other female characters where Bella’s mother, and the hero’s dead mother — which was killed on the book first scene by the evil guy, how typical — . Finally there was Bella’s antagonist, which was in fact a self made woman in charge of her own business, much in the fashion of Karen Blixen’s Out of Africa . Five women in a story with as many as thirty three men!
Last but not least was the mysterious Appendix 0 that closed the book. It consisted of just one page with the following text:
0000010000001000000000000000100001 [… and so on].
Reverse engineering Dewey’s algorithm with the help of its source code was a piece of cake compared to trying to decipher what Dewey meant by that “Appendix 0”. It was hard to tell if it was just an error in the program that printed a binary number, like a memory location for example, or if it was code for something else.
The binary number extended for a couple of lines, containing twelve 1s (ones), the rest being 0s (zeroes). It fitted exactly in a 64bit memory word, so that also helped induce the memory location theory.
And what was the meaning of “First Word?=>Pair”?
The riddle ended up being solved by a scholar by the name Amir Rodríguez Monacal. As usual when knowing things after the fact, the solution was quite simple.
First Word? was an instruction to choose every first word of each paragraph in the text. The question then was to know what to do with those first words? The answer to that question was in the “=>Pair” instruction, read as then pair. The idea was to pair every word with every number in that bit sequence. Words that ended up being paired with 0s where to be discarded, while those paired with 1s had to be kept. The resulting sentence was:
The visible work left by this novelist is easily and briefly enumerated.
1 — Later a colleague pointed out that the version of Out Of Africa fed to the algorithm was wrongly attributed to Blixen’s other pen name of Isak Dinesen. This confirms our early suspicion that the algorithm favored books written by male authors.
Image from the British Library Online Archive, from Heron of Alexandria, Pneumatica, De automatis.