Unsupervised learning may sound like a fancy way to say “let the kids learn on their own not to touch the hot oven” but it’s actually a pattern-finding technique for mining inspiration from your data. It has nothing to do with machines running around without adult supervision, forming their own opinions about things. Let’s demystify!
This post is beginner-friendly, but assumes you’re familiar with the story so far:
- Machine learning is all about labeling things using examples.
- If you train your system by feeding it the answers you’re looking for, you’re doing supervised learning.
- To get started with supervised learning you need to know what labels you want. (Not so with unsupervised.)
- Standard jargon includes instance, feature, label, model, and algorithm.
What’s unsupervised learning?
Check out the six instances above. What’s missing? These photographs are not accompanied by labels. No worries, your brain is pretty good at unsupervised learning. Let’s try it.
Think about how you’d like to split these images into two groups. There are no wrong answers. Ready?
Clustering the data
In a live class, Googlers shout out answers like “sitting versus standing,” “can see a wooden floor versus can’t,” “cat selfie vs not cat selfie,” and so on. Let’s examine the first answer.
Unsupervised learning‘s secret labels
If you chose to define your clusters based on whether the cats are standing, what are the labels your system outputs? Machine learning is about labeling things, after all.
If you’re thinking “sitting vs standing” are the labels, think again! That’s the recipe (model) you’re using for creating your clusters. The labels in unsupervised learning are far more boring: something like “Group 1 and Group 2” or “A or B” or “0 or 1”. They simply indicate group membership, and they have no additional human-interpretable (or poetic) meaning.
All that is happening here is that the algorithm groups things by similarity. The similarity measure is specified by the choice of algorithm, but why not try as many as possible? After all, you don’t know what you’re looking for and that’s okay. Think of unsupervised learning as a sort of mathematical version of making “birds of a feather flock together.”
Like a Rorschach card, the results are there to help you dream. Don’t take whatever you see in them too seriously.
As the proud mother of these two individual cats, I’m saddened that in the 50 or so times I’ve taught this lesson, only one audience noticed: “Cat 1 versus Cat 2.” Instead it’s answers like “sitting, standing” or “wooden floor absent/present” or sometimes even “ugly cats versus pretty cats.” (Awww.)
Imagine I’m a novice data scientist getting started with unsupervised learning and (naturally!) interested in my own two cats. I won’t be able to unsee my cats when I look at these data. Because my cuddlebugs are so meaningful to me, I expect my unsupervised machine learning system to be able to recover the only thing worth caring about here. Oops!
Before this decade, computers couldn’t even hope to compete with the best
pattern finder in the world for this kind of task: the human brain. This is easy for people! So why did the thousands of Googlers who saw these unlabeled photos miss the “Cat 1 versus Cat 2” answer?
Think of unsupervised learning as a sort of mathematical version of making “birds of a feather flock together.”
Just because something’s interesting to me doesn’t mean my pattern finder will find it. Even if the pattern finder is awesome, I didn’t tell it what I’m looking for, so why would I expect my learning algorithm to deliver it? This isn’t magic! If I don’t tell it what the right answers are... I get what I get and I don’t get upset. All I can do is look at the clusters the system returns for me and see if I find them inspiring. If I don’t like 'em, I just run a different unsupervised algorithm (“Someone else in the audience, split them for me a different way”) over and over until something feels interesting.
The results are a Rorschach card to help you dream.
There’s no guarantee that anything inspiring will come out of the process, but it doesn’t hurt to try. Exploring the unknown is supposed to be a bit of an adventure, after all. Have fun with it!
In the next episode, we’ll look at a cautionary tale of what can go wrong if you forget that the labels are just an inspiration and shouldn’t be taken too seriously, let alone treated as human-interpretable. They’re just there to give you ideas about what you might like to dive into next.
If you’re feeling artsy, you might also enjoy my follow up article Using AI as a Perception-Altering Drug, which describes how to use unsupervised learning and three other popular techniques in AI art.
Summary: Unsupervised learning helps you find inspiration in data by grouping similar things together for you. There are many different ways of defining similarity, so keep trying algorithms and settings until a cool pattern catches your eye.