I trained a “document classifier” to guess facts about encyclopedia articles, using:
- a new dataset from Google that links 4.7 million Wikipedia articles to 18.58 million facts from WikiData
- Facebook’s speedy text classification tool FastText
Then I ran it over articles from Tolkien Gateway, the biggest Lord of the Rings wiki, and got surprising results. Let’s dive in!
Here’s a more controversial one that clearly needed a reference:
A few months ago Google did the dirty work of wrangling the data from Wikipedia and WikiData into one place, and put it in a massive zip file on the internet, and released a paper about trying to use it to guess facts about Wikipedia articles. You can read about my initial exploration of it if you’re in the mood.
Meanwhile, Facebook have a tool called FastText that can learn associations between the phrases in a document and facts about said document.
So in the above example, it can notice that the article about Shigeru Miyamoto uses the word “he” a lot more than it uses the word “it”, and contains the phrase “was born” instead of “was constructed”. So the next time it sees an article with those features, it might guess that the article is talking about a human instead of a bowling alley.
(It also does this quicker than a lot of other methods, which is useful for people like me who don’t have access to Google’s supercomputers.)
Once I was getting good results guessing facts about unseen Wikipedia articles, I decided to try it out on articles from Tolkien Gateway, my go-to resource for resolving arguments about whether Isildur’s dad is named Elendil or Eärendil.
People seem to really like Gandalf:
So let’s start there.
What did FastText say were the most likely facts about Gandalf?
On the fact of it, these might seem a little silly. But keep in mind:
- Tolkien was a British man who fought in a World War, who was deeply Catholic. Maybe Gandalf is a subconscious self-insertion?
- Re. the canonisation, Gandalf is one of the Ainur, which Tolkien Gateway describes as “angelic beings”
- Gandalf totally murdered a Balrog
- If anyone’s looking for a PhD topic, “Gandalf as Goalkeeper, Sauron as Ronaldo: Post-Colonialism in Middle-Earth” is a surefire winner.
- Baroque, eh? Did you know that the Lord of the Rings is partly based on an opera?
- Legolas is 2931 years old (disputed, see here). The Roman Senate was first founded 2771 years ago. Coincidence?
- He never had children, and had a fair singing voice. Is he a eunuch? You tell me.
- The Islam thing is left as an extremely difficult exercise for the reader. I’m not touching that.
- If you accept the theory that Middle-Earth is just our world a long time ago, then the Shire is definitely a former country inside the UK
- The Shire is a bastion of natural beauty, peace and elevensies, under siege by greedy men. It would’ve been desecrated ages ago if it wasn’t defended by star player Gandalf (and decades later by King Aragorn). It definitely deserves conservation status.
The One Ring
While The Ring was obviously a participant in the most important conflict of Middle-Earth, he was on the opposite side to Gandalf and co. But can we really blame my nascent AI for being fooled? Even Boromir son of Denethor got confused in the end about which side it was playing on.
Little is known about The Ring’s career at the summer games, except that it bent all five Olypmic Rings to its will.
I find it fascinating that Sam is one of the only people labelled as American in the whole legendarium (Éowyn being another). Please tell me your theories.
As for the drug trafficker thing:
- The Ring is super addictive. Remember Bilbo’s face when he sees it again at Rivendell?
- Sam helps sneaks it across international borders in a dangerous manner
- He’s a gardener with extensive herb lore and a fondness for the finest weed in the South Farthing
- My kingdom to anyone who can come up with a convincing theory for why Éowyn is a Republican while Sam is a Democrat, and why they’re the only two American mutants.
I give up.
Further study is needed.
(If you want to try reproducing my results, let me know, and I’ll put the FastText model up in a torrent for you.)