In a recent publication, Apple explained how they used machine learning to directly recognize people in private photos on your iPhones and iPads without having access to your images to train their algorithms.
For those of you with Apple products, you can actually research by the person in the Photos app.
Indeed, using multiple machine learning-based algorithms that I will cover in this article, running privately on your device, you are able to accurately curate and organize your images and videos on iOS 15.
It will recognize the different people and allow you to research in your pictures where the person appears. If you have thousands of photos like I do, you will already have different clusters each representing different people.
For example, one such cluster could be all the photos where your friend John is in so that you can name it “John” and then search for images of John in your pictures to have them appear automatically.
Watch the video to learn more!
► Read the Full article: https://www.louisbouchard.ai/how-apple-photos-recognizes-people/
►Apple, "Recognizing People in Photos Through Private On-Device Machine Learning", (2021), https://machinelearning.apple.com/research/recognizing-people-photos
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
00:00
In a recent publication, Apple explained how they used machine learning to directly recognize
00:04
people in private photos on your iPhones and iPads without having access to your images
00:09
to train their algorithms.
00:11
I personally do not have an iPhone, so I cannot test it myself, but I am looking for an iPad
00:15
to draw explanations and write math equations and stream it during calls.
00:19
If some of you guys use tablets to do that, please let me know what you think is the best
00:24
to get!
00:25
For those of you with Apple products, you can actually research by the person in the
00:28
Photos app.
00:30
Indeed, using multiple machine learning-based algorithms that I will cover in this video,
00:34
running privately on your device, you are able to accurately curate and organize your
00:39
images and videos on iOS 15.
00:42
It will recognize the different people and allow you to research in your pictures where
00:45
the person appears.
00:47
If you have thousands of photos like I do, you will already have different clusters each
00:50
representing different people.
00:52
For example, one such cluster could be all the photos where your friend John is in so
00:57
that you can name it "John" and then search for images of John in your pictures to have
01:02
them appear automatically.
01:04
It can even recognize photos where the same people frequently appear, even if it doesn't
01:08
know the persons individually or hasn't been directly trained with it, and use it to share
01:13
memories like the "Together" feature shown here.
01:16
This is a super cool built-in application by Apple, and the best is that it even works
01:21
when the face is occluded or sideways, as we will see.
01:24
As I said, it seems to work really well.
01:27
It entirely runs on your device privately, and they are always improving the algorithms,
01:32
but it's even cooler to know how it works, so let dive into it!
01:36
This task of recognizing people in your own picture is extremely challenging because of
01:40
the variability your photos will have.
01:43
Different people, different angles, different scales, different lightings, occlusions because
01:47
your friend was catching a football, or even from other cameras.
01:52
If we would strictly base ourselves on the person's face, this would be pretty incomplete
01:57
as most of our pictures taken on the spot during an event aren't perfect images with
02:02
your friends smiling in front of the camera.
02:04
When you type in John, you'd like to see these events where John won the game by catching
02:09
this ball.
02:10
To attack this, they start by locating the faces and upper bodies of people visible in
02:15
the image using a first detection algorithm.
02:18
This algorithm was trained on many labeled human examples annotated with where the bodies
02:22
and the faces were.
02:24
Meaning that they trained a deep neural network with images sent as inputs, and the outputs
02:28
were only the cropped version of the image with either the bodies or faces of the people.
02:34
This is done by feeding many examples to the network, helping it showing where to focus
02:38
its attention with the correct identified sections.
02:42
This way, it can iteratively learn to find this body part by itself afterward if we show
02:47
it enough examples during training.
02:49
photo: figure 2 cursor + network (U-net, VAE?) with images and box over other image and incorrect
02:50
box until it's better?
02:51
By the way, if you find this interesting, don't forget to subscribe, like the video,
02:53
and share it with your friends or colleagues, it helps a lot!
02:57
Thank you!
02:58
Then, they match the bodies and faces of each individual to have even more data about the
03:03
person in case only one of the two appears in a future image.
03:06
You can see here that both the body and face are sent into a separate model that encodes
03:12
the information, creating embeddings.
03:14
These embeddings are simply the most valuable information about the face and body of the
03:18
person.
03:19
Here, we use another network to encode the information because we want our embeddings
03:23
to be similar for the same person and different for different individuals.
03:27
This is again done with another model that will look like this, inspired by mobilenet,
03:32
which I talked about in my convolutional neural network video.
03:36
It is a lightweight convolutional neural network that can run extremely efficiently, made for
03:41
mobile instead of GPUs.
03:43
If you are not familiar with CNNs, I strongly invite you to watch the video I made explaining
03:48
them simply.
03:49
Basically, it takes the cropped images and compresses the information in a smaller space
03:53
focusing on the most interesting details about the individual.
03:57
This is possible because such a model was trained on a lot of images to do exactly that.
04:02
Then, these embeddings are merged and saved in your phone's gallery unless they have poor
04:07
responses.
04:09
These poor responses may come from unclear faces or upper bodies and would be automatically
04:14
filtered out.
04:15
This is repeated with all your pictures to create clusters out of these embeddings.
04:20
These clusters will be the different people identified.
04:22
It will merge all similar embeddings in small groups where each group is a specific individual.
04:28
So this is the step where all the pictures where John was identified are put into a gallery.
04:33
And what's cool is that this automatically runs during nighttime when your phone charges
04:38
while you sleep and keeps on improving the more pictures you have.
04:42
So once these clusters are created, your new photos containing people are sent to the same
04:46
deep network to create a new embedding per person in the image.
04:50
This new embedding will either join a cluster if they find a match or create a new one based
04:55
on the difference between the embeddings you have in your phone and the new picture's embeddings.
05:00
Here, to find whether it is the same person or not, they focus primarily on the face.
05:04
If it's occluded or sideways, it uses the upper body coupled with what we have from
05:09
the face and takes the time of the photo into account to measure if the clothing could be
05:13
the same or different.
05:15
As you may suspect, the upper body isn't always helpful.
05:18
As they say, "We’ve carefully tuned the set of face and upper body distance thresholds
05:23
to get the most out of the upper body embedding without negatively impacting overall accuracy."
05:29
And this is how Photos regroup your friends within the application without you knowing
05:34
it!
05:36
Another concern was that they want to offer the same experience for all Apple users no
05:40
matter the photographic subject’s skin color, age, or gender.
05:44
It is great that they keep on improving the generalization and working to remove these
05:48
biases from their algorithm the best they can using the broadest datasets possible and
05:53
data augmentation to add variations to the training images.
05:57
If you have an iPhone or iPad, please let me know what you think of this feature in
06:01
the Photos app and how well it works!
06:04
Thank you for watching!