Pierre Karpov

@pierrekarpov

Can I Snooze: building an image recognition system to sleep a little longer

“raccoon lying on tree branch with black bucket on its back” by Successfully Canadian on Unsplash

TL;DR

This is a story of the journey I went through. I wanted to design a machine learning system that would tell me whether I can snooze in the morning and still arrive on time for work.

I do get a bit technical in this article, so I added some links to some of the more advanced terms in this article

If you want to learn how to build image recognition systems, there are better articles out there. But if you want a couple of good chuckles, and see how easy it is to get something that resembles machine learning up and running, then this is article is for you.

Context and Problem definition

Just after completing Professor Ng’s Machine Learning course, I was looking for challenging problems to put those newly learnt algorithms to good use.

I came up with a lot of fun and interesting ideas. But ultimately, none of those were actual problems. Yes, space exploration sounds fun. But let’s be real, I’m only considering this for the sake of working on a complex project. Not because I have this burning desire to find new clusters of stars.

Then one morning, I once again made the ill-advised decision to hit snooze on my alarm clock. Because of traffic, I arrived gross and sweaty at work, somewhat on time..

There’s nothing like a morning at work after a good run

It was a yucky way to start the day, but it made me realize that this is what I should work on. There’s not much I actually complain about. But if I had the superpower to assert with confidence whether to hit that tempting snooze button, life would truly be wonderful!

The outline of the problem was pretty clear. In the end, we want a system that tells us ahead of time whether the route is clear and sleeping another 10 minutes won’t hurt. Luckily for me, it just happens that I live in Singapore, where we have access to a ton of public data, including traffic camera images.

Designing a system pipeline

With all this in mind, I designed to following pipeline for my machine learning system:

  1. Get traffic camera images
  2. Build tool to classify image subsections
  3. Classify a lot of examples car/no car
  4. Crop and resize selections out of original pictures
  5. Build own deep Neural Networks
  6. Train Neural Network to detect cars
  7. Detect the car sections in actual traffic pictures
  8. Count how many cars in pictures
  9. Gather images for the times and locations of my commute specifically
  10. Build time series with how many cars, at what time, and where
  11. Estimate whether there will be traffic tomorrow
  12. Snooze peacefully 😴😴😴

And there began my data science journey. And boy oh boy, what a journey!

1. Getting traffic camera images

This was surprisingly one of the easiest parts. Data.gov.sg did an good job with their APIs. With little to no experience with the json, csv, and request Python libraries, you can quickly and neatly download a lot of traffic camera images.

I decided to download data from all around the country and at all times. Even though in the end we will only need to detect cars in front of my apartment complex at around 9:00 AM. I did that in order to train a more versatile Image Recognition System that can then easily be applied to other places and times.

Now that we have a bunch of traffic camera images, let’s work on detecting cars

2. Classifying subsections of an image

Getting traffic images is great, but unfortunately, can can’t just shove that in a Neural Network and expect it to do a decent job at counting cars. (Well maybe we could, but it is probably not the most elegant and accurate way of dealing with the problem).

So I started going through each image, try to keep track of the coordinates of each cars to later crop them out to train our car detection Neural Network. I very quickly realized that this was getting out of hand. So I wrote a script to make my life easier.

This is one of the reasons why I fell in love with programming in the first place: we have the power to make our work tools better. We use programs to program, so we can program a program, so that it would program better and we don’t have to program as much!

I used the Tkinter library to make a program where I can make image selections with only a couple of clicks and key strokes.

We wrote a script to classify cars within a picture, now is time to use it

3. Classifying loads of examples car/no car

Labelling traffic camera images

Even if it was dull, repetitive work, I managed to label almost 200 cars and more than 1500 non cars selection across 30 ish images, under an hour. I made the executive decision that this was enough data to move on to the next (I was bored).

4. Cropping and resizing selections

This part is also pretty straightforward. The script I wrote for part 3 stores the image selections relevant information in a neat csv file. I only have to keep track of whether the selection is a car or not, the coordinate of the top left corner, the size of the selection. Knowing all this, I used the Pillow library (I know, it is very fitting to our little project) to crop out cars and non-cars selections.

5. Writing code to build, train, and test my own deep Neural Networks

This is it, we’re building our own Neural Network lads! Oh boy was I in for a treat. I decided to build it from the ground up, so I would get a deeper grasp of what is going on under the hood. The concepts behind neural networks are not that hard to get (except maybe the derivations for the update functions). But implementing all of it is a whole other story.

For starters, it really helps if you’re not an idiot. I implemented the different components and all the logic looked fine to me.. However, none of my node weights were updating properly. After a couple hours of debugging, and a couple grams of hair missing from my scalp, I found out I was using Python 2 this whole time where 1 / 2 = 0 and not Python 3 where 1 / 2 = 0.5. So all my coefficients would at some point end up being 0 (and turns out 0 + a = a), so the network would not update itself at all.

I also spend a lot of time trying various hyperparameters for the network as I found that it wasn’t performing well enough. I was baffled, I was training my 3 layer of 5 units each Neural Network on 9 whole training examples, and somehow, the algorithm wasn’t doing a good job at guessing the 10th example. I was essentially a 50/50 for the network to get it right. Was I lied to this whole time? Are Neural Networks not the high-performing-state-of-the-art-predictive-classifier I was made to believe they were? Luckily, I had the presence of mind that you can update your Neural Network 100,000 times, but maybe you need a bit more than 9 data points to get an algorithm somewhat close to being useful.

So I used an actual data set, namely, the iris data set from scikit-learn. Why this one in particular? Well you see, we used the particular data set in one of my favorite class assignments I had in university (we had to build an ID3, if you’re looking for a fun, math focused problem, go check it out).

The algorithm was now doing better (better than our previous 50/50 algorithm, yay!). However, it was still acting a bit weird. No matter how many times the algorithm was running, or how fast it was performing updates, I would always end up with either a 60% accuracy or a 40% accuracy. I felt really discouraged. Even if it was doing better, how could such an algorithm ever be able to tell cars apart from road signs, if it can’t even tell dumb flowers apart??? I was on the verge of quitting programing and following Edmund Blackadder’s footsteps.

For fun though, I decided to see at least how well Neural Networks from standard libraries would do. So I ran the data set against scikit-learn’s Neural Network classifier. I was astonished. Not only were the results far from the 99% accuracy that I was expecting, but the results were also almost always the same as the results I got when using my implementation. So no matter how ugly and inefficient our code is, we must have gotten a least a couple things right. Hurray!

Our subpar 60% or 40% was in fact not due to our poor implementation, but most likely due to our data set which is still too small. But oh well, we finally have a tool to design and train deep Neural Networks. Good enough by my books, gotta keep moving!

6. Training Neural Network on car images

How exciting, we get to finally use our Neural Network we spend an unnecessary amount of time designing! Errrr wrong, try again next time! Sadly, our algorithm can’t handle too many inputs and logical units. While it was still on par with scikit-learn algorithm for a 5-input and 7-ish-logical-unit networks, it completely blew up up when trying to handle images. (For reference, I am using 32px by 32px with 3 color channels for in image input, so the input has 32 * 32 * 3 = 3072 features which is a lot more than 5)

Goodbye custom Neural Network :’(

To be more precise, it was actually my custom activation function (sigmoid and ReLu) and their derivatives that went boom and tried dividing stuff by 0 (I do not recommend trying to divide by 0).

It was heartbreaking, but we got a problem to fix folks: I still need to know whether I can snooze or not! So I gave in, and used a library’s implementation. Only a fool would change a winning team, so I went with the good ol’ scikit-learn Neural Network Classifier.

The algorithm was doing quite well actually, right out of the box I got a consistent 88% accuracy over my test set. I then performed hyperparameter fine tuning. I looked for the best values for my network size, my learning rate, and the number of iterations to go through. I managed to increase accuracy up to 91%, so I saved the Neural Network model and called it a day.

We ended up training a Neural Network with 90%+ accuracy, let’s see how it holds against raw traffic images

7. Detecting car sections in pictures

It’s great that our algorithm is working fine on pre-cropped car/no car images. Now we need to find those cars in traffic camera images. I chose to go with the sliding window technique. Essentially I look at the top left corner, see if it is a car or not, if it is, I keep track of that car. Then I slide to the right, and when i reach the end of the line, I go back to the left and shift down.

There is no doubt in my mind that this approach is one of the most gruesome out there. But pigs don’t get fat on clear water. In that case, we’re the pig and we need some fatty porridge. Maybe in the future we’ll be more sophisticated and we’ll use crystal clear feature detections methods.. But not now. I want results and I want them fast!

The results were….interesting. It was doing ok, but a lot of consistent errors that I could foresee would impair our traffic detection algorithm.

First run on an actual traffic image

As you can see, it does detect some cars pretty well, but also the arrows indicating the roads’ destinations. And more importantly, it detected a lot of cars on the right and the bottom…

Turns out, I was trying to detect cars outside the picture on the right and at the bottom. I was doing one too many slide in both direction.. This tweet felt painfully relevant at that time:

All jokes aside, after fixing that issue and refining the parameters, we got something that wasn’t half bad, and I was happy.

Labels and lane dividers are still an issue, but it s good enough to get something to work.

Now that we detect cars pretty well, it’s high time we count how many there are in a picture.

8. Counting cars

So our sliding door technique paired with our car recognition system work pretty well. But now, how are we supposed to know how many cars there are?? We can’t just count how many times our car algorithm detects a car. As you can see from the previous image, one car can be detected multiple times, and some non car sections are detected, but only once, and probably should be ignored.

This sort of looked like a clustering project, so of course, having just learn the k-nearest-neighbors technique, I just dove into implementing it for our problem. One thing though, this technique actually requires you to know how many neighborhood there are (in our case, how many cars there are). Huh?? But that’s what we’re trying to find out.. But it’s all good, we’ll run k-NN for all numbers between 1 and 30, and see which one is the most correct. It actually did work, surprisingly. But god was it slow.

Then, one day while I was grocery shopping (that’s not relevant to the story at all, but I’m rolling with it), I realized I was overcomplicating things for no good reason. How would a 3 year old solve this, I asked myself. Well, they would probably color any blob that is connected, then do that for all the blobs in the picture, and count how many times they colored a blob.

My outsourced team, counting cars for me while I take a nap

So I did just that, a 3-year-old’s coloring algorithm and ran it against our beloved traffic camera images. It definitely was ugly and inaccurate, but here’s the thing: we don’t need to know how many cars there are. What we need, is the number of cars detected to go up significantly when the actual number of cars is high, and vice versa. Luckily for me, my algorithm was doing just that. Yay me!

We detected 13 cars, top left is the image with the detected cars, bottom left is the neighborhood heights, top right is the normalized neighborhoods, one of them colored in, bottom right is the same as top right but we removed the colored neighborhood (so we don’t count is again)
We detected 20 cars here. While there probably aren’t 20 cars, it s a higher number than what we detected on the previous image, and there is significantly more traffic

We can finally estimate the traffic flow, all that’s left for us to do is to build traffic historical data, so we can extrapolate for future days.

9. Gathering images down my apartment at 9am

Ezpz, that’s just step 1, but instead of looking at any and all picture the government allows me to see, I need to only look at the ones coming from camera 6703 around 9AM, because ultimately that’s all I care about. Boom, onto the next step!

10. Building traffic time series

Ezpz squared! We just get those car counts and timestamps we got from last step and store them in some file somewhere. Damn we’re really dashing through those stages aren’t we?

11. Extrapolating from our time series

So let’s recap, we have the “number of cars” there was for the past 20 days ish. Now let’s guess what it’s gonna be tomorrow! Better yet, let’s guess for the next 5 days!

At first I wanted to use a LSTM algorithm as it is the state of the art and it does well against seasonal trends, and that might be a thing for traffic. Upon further reading, I learnt that LSTM actually relies on a recurrent Neural Network architecture. Having only 20 data points, I did not want to repeat our step 5 disaster and waste countless hours for nothing. So in the end, I went with the ARIMA method, which also does well with seasonal trend, but is uses a more statistical approach that works with fewer data points.

We got something like this:

Blue is actual data, red is prediction

Not too shabby, huh? Considering we trained the model on 15 examples to guess the next 5. You know what, I ship it!

Hurray! We gathered enough data to predict the future! Nostradamus would be so proud.

12. Snoozing peacefully 😴😴😴

I now check every night if I will snooze tomorrow. Never have I ever been late for work since! And my project and I lived happily ever after, the end.

Errrr wrong again, better luck next time!

Even though we’re estimating the number of cars pretty well, and predicting tomorrow’s traffic somewhat decently, there are still improvements to make. Of course, we have plenty of places to look for optimization. But the bigger problem is that our base hypothesis is not accurate enough. Yes, traffic does play a big role in determining the length of my commute, but the commute itself plays a huge role too. You see, I take the bus then the train to go to work. While missing a train only adds another 2 minutes to my morning travels. If I get unlucky and three buses pass by me just before I can get to them, I might be looking at another 20 minutes to my 40 minute journey..

Where to go from there

As I just mentioned, it would be interesting to make use of the real time bus arrival updates and see how much it would improve our current system.

Of course we can always improve our car recognition Neural Network by feeding it more labelled data. We can also improve of our simple sliding doors and neighborhood counting techniques

Ultimately, checking the traffic the day before isn’t the most practical anyway. So ideally, we have a script that runs on its own while we sleep, and counts cars in real time. Based on that, it rings the alarm sooner or later.

Parting thoughts

Wow that was a long story, I’m glad you made it this far.

All in all, it has been a blast building this entire system from scratch. And I was surprised at how easy it ended up being. This is mainly because I forced myself to work on it in small increments, but everyday. Once you’ve figured out how to break big problems into smaller ones, the sky is the limit.

So, be ambitious! Go and solve complex things! As long as you enjoy doing it, all you need is to keep at it and you’ll have something to be proud of in the end!

You can find all the code here. The code isn’t really plug and play, but I’m more than happy to answer any questions you may have.

Cheers!

Topics of interest

More Related Stories