Hackernoon logoUnderstanding What Artificial Intelligence Actually Sees by@bourdakos1

Understanding What Artificial Intelligence Actually Sees

Author profile picture

@bourdakos1Nick Bourdakos

Computer vision addict at IBM

Many call artificial intelligence (AI) a โ€œblack boxโ€, and it kinda is. One of the biggest problems of AI is that itโ€™s incredibly difficult to understand how the data is being interpreted.

Before we get our hands dirty and dive deeper, letโ€™s play a little game.

Iโ€™m going to show you a series of abstract images that are either in category A or B.

Do you think the following image belongs to category A orย B?

Hint: Thereโ€™s no C.

Weโ€™ll get back to this later.

Letโ€™s look at some more examplesย first.

Now can you tell if it belongs to A orย B?

โš ๏ธ Spoilerย Alert

The answer isโ€ฆย A!

If you choose B, donโ€™t be embarassed, youโ€™re not alone. When asked to a room full of engineers and developers, the split is always 50/50. Soโ€ฆ why is the answer Aโ€ฆ?

Because I said so.

The answer is A, thereโ€™s no debating it, but if you donโ€™t agree with me, then it was my fault as the trainer.

As the trainer, I know that A is a red circle. So anything with a red circle in it is A. I also know that B is an orange circle. The rest of the image is irrelevant. Itโ€™s all about trying to find a pattern between the set of images.

But itโ€™s hard.

In an AI system, I canโ€™t explain with words what makes the image A. All I can do is show you more pictures and hope it starts to click.

And you, the AI, canโ€™t tell me why you think itโ€™s B. Itโ€™s up to me to blindly feed you data until you get the answer right.

Hereโ€™s the same set of images, but less abstract. If I were to ask you the same question, everyone would know right away that A is an apple and B is an orange. This is almost so easy that many people think itโ€™s a trick question. We all know that the hand and background are all irrelevant information, because weโ€™re humans and grew up learning these things, but for AI itโ€™s not a given. It sees images as more abstract and doesnโ€™t know what you want it to focus on.

A Miscommunication

Letโ€™s take a look at another toy scenario that shows how we might accidentally communicate the wrong signals to the AI system.

We have a few samples of oak trees. (Itโ€™s a bit cloudy where I live)

Here are some palm trees. (It was really sunny on the beach)

This next example is a palm tree, but the lighting is much closer to the oak trees. Which pattern should we focus on? The lighting? Or the shape of the tree? It might be difficult for the model to tell.

Confidence:
- Palm 0.75
- Oak 0.60

With this example, it might be pretty obvious that we left behind an unintended pattern for the AI to pick up. However, in reality, itโ€™s normally something much more inconspicuous.

Peeking Under theย Curtain

So how can we get more insight into what the AI is focussing on?

What if we passed a rectangle over the image and recorded the changes in confidence? If the confidence drops, then thatโ€™s probably an important part of the image.

Which picture makes it easier to tell that this cable is a USB?

The first image completely obscures the connector, making it nearly impossible to guess, so we can denote the region the rectangle covers as important. However, the rectangle in the second image doesnโ€™t hinder our ability to determine the cable type. We can safely mark the location as insignificant.

We can continue to pass the rectangle over image to establish a heat map of importance.

We can see that the modelโ€™s focus is on the tip of the connector, which is great. Itโ€™s looking where we want it to.

Letโ€™s look at a model that wasnโ€™t trained well.

Confidence:
- USB 0.76

The model correctly predicted that the cable was a USB with a confidence of 0.76. We might say thatโ€™s acceptable, especially since the photo is far away and isnโ€™t great quality.

However, upon closer inspection, the model seems to be focusing on the wrong area, not the ends of the cable like we would expect.

What does this tell us?

The model appears to rely too heavily on the wire and fingers. To improve accuracy and clear up the confusion, we can include more examples of wire and hands in a negative training set.

We donโ€™t need to train on piles and piles of generic data until our model starts performing better. We can tactfully use this information as an aid in retraining the model, saving us time and money.

Using theย Tool

Wow! This is great, but I donโ€™t want to put in the effort to actually implement this

Good news! You can find the fully functional iOS app on my GitHub ๐Ÿ˜˜

Final Thoughts

Creating your own model is easy, but that doesnโ€™t mean the work stops there. The hardest part of machine learning is always producing good data.

We can use the basic guidelines of having similar pose, lighting and a consistent mix of stock and natural photos across our training images to gain a foothold in our quest toward a good model. After that, we are left using tools and our intuition to try and gain insight into the thought process of AI.

Thanks for reading! If you have any questions, feel free to reach out at bourdakos1@gmail.com, connect with me on LinkedIn, or follow me on Medium and Twitter.

If you found this article helpful, it would mean a lot if you gave it some applause๐Ÿ‘ and shared to help others find it! And feel free to leave a comment below.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.