Building an insanely fast image classifier on Android with MobileNets in TensorFlow

Part two of a two-part series: It’s like hot dog not hot dog, but for roads

In part 1, Creating Insanely Fast Image Classifiers with MobileNet in TensorFlow, we covered how to retrain a MobileNet on a new dataset. Specifically, we trained a classifier to detect Road or Not Road at more than 400 frames per second on a laptop.

MobileNets are made for — wait for it — mobile devices. So, let’s move our road not road model to an Android app so we can see it in action.

Goals and Plan

Let’s set some constraints so we have something specific to shoot for. We’ll attempt to:

Retrain a MobileNet on a very small amount of purpose-built data
Achieve 95% classification accuracy on a hold out test set
Use less than 5% of a $300 device’s CPU while running inference

To do that, we’ll follow these steps:

Generate a new training dataset
Train several MobileNet configurations to find the smallest net that will hit our accuracy target
Get benchmarks by running Inception V3 on Android
Update the TensorFlow Android example app to use our MobileNet model
Try it in the wild
Tune it to get below 5% CPU usage

Building the Dataset

In the previous post, we were classifying road/not road generally, so we pulled images from several sources. Now we’re going to drill down on the problem a bit more. If you recall, the purpose for this project is for user privacy: At Coastline, we’re building driving safety features for mobile devices that make use of the camera. So when someone turns on the app, we want to validate that what we’re looking at is a road. If we aren’t, we’ll disable recording.

So to build our training set, I’m going to walk around doing everyday things while recording video: Around my house, outside my car, inside my car fiddling with the radio, petting the cats, etc. This will be our “not road” training data.

Some “not road” examples.

For our “road” data, I’m going to sample randomly from the Coastline driving dataset, which is from a camera facing out the front of a car.

“Road” examples. Notice each image has a mount in view. We need to be careful that the network doesn’t simply learn to recognize that object. We’ll take care of that through data augmentation.

With 3,000 examples from each set, we’re ready to train.

Training MobileNet on our custom dataset

The next step is to see what sort of accuracy we can get from the different MobileNet configurations. We’ll start with training the widest one: MoileNet 1.0 @ 128. And because we’re going to put this on a mobile device, we’ll use quantized weights, which will reduce the model memory footprint even further.

For the details on how to retrain MobileNet on your own data, be sure to check out my previous post.

From the root TensorFlow folder, we’ll run:

python tensorflow/examples/image_retraining/retrain.py \  --image_dir ~/ml/blogs/road-not-road/data/ \  --learning_rate=

After 1,000 training steps, we achieve 99.7% accuracy on our hold out set. Wow! Apparently MobileNets are pretty good at classifying roads.

Here are a couple images it miss-classified:

Left: A “not road” image classified as road. I’d say that’s an acceptable failure. It’s clearly a road, just not the type of road we’re looking for. Right: A “road” image classified as “not road”. I think this is because there were no bridges in the training set. Could be fixed with more data.

Now let’s do the same thing, but with the smallest MobileNet: 0.25 @ 128, quantized. After 1,000 training steps, we get to 92.6%. Doesn’t satisfy our accuracy target.

How about something a little wider, say 0.5?

95.0%! And the final model is just 1.6 MB. Looks like our money shot, if just barely. (It should be noted that this entire model is trained on just 10 minutes of video captured at 10fps. There is a lot of room for improvement by piling on the data.)

Let’s give it a quick try to make sure it’s working as expected:

python tensorflow/examples/label_image/label_image.py \--graph=/tmp/output_graph.pb \--labels=/tmp/output_labels.txt \--image=/home/harvitronix/ml/blogs/road-not-road/test-image.jpg \--input_layer=input \--output_layer=final_result \--input_mean=128 \--input_std=128 \--input_width=128 \--input_height=128

Road: 0.99023 confidence. Looks good!

And since our headline has the words “insanely fast” in it, how fast can we run this on an NVIDIA GeForce 960m GPU on our laptop? It runs through 1,000 images in just 3.36 seconds. That’s 297.6 frames per second!

Using our MobileNet model in an Android app

Now that we have a model that’s tiny, fast and accurate enough for our use case, let’s load it up in an Android app so we can test it in the real world.

Don’t have a model trained yet? Download the model I trained on the data described above. It contains both the .pb and label files. Extract and follow the instructions below.

Sticking with our theme of using tools provided by TensorFlow (how awesome is that, btw?), we’ll make use of the Android example project to get this thing running in no time flat.

Getting and building the project

If you haven’t already, go ahead and clone the TensorFlow repo:

git clone https://github.com/tensorflow/tensorflow.git --depth 1

You’ll find an Android project ready-made for doing this kind of task in the tesnorflow/examples/android folder. Open the folder in Android Studio, build it, load the APK on your phone, and you’ve got an image classifier that uses the Inception V3 model trained on ImageNet, ready to tell apart your cat from a platypus.

If you have trouble building the app, be sure to take a look at the instructions in the TensorFlow Android ReadMe. My biggest challenge was the NDK version. Downgrading to r12b solved my problems.

Inception speed benchmark

Since we’ve already got Inception running on the app, let’s take some measurements so we can compare it against our MobileNet.

The Inception model that comes with the example project is 53.9 mb. That’s versus our MobileNet that’s just 1.6 mb! It runs at about 240ms per inference, or 4 frames per second (fps). And it uses about 40% of the CPU.

Inception V3 running at 4fps.

Let’s try it at 1fps:

Inception V3 running at 1fps.

Still up over 35%. Let’s hope our MobileNet can do better than that, or we’re not going to get anywhere near our goal of max 5% usage.

Switching to MobileNet

Now let’s make a couple minor changes to the Android project to use our custom MobileNet model.

First, copy your model and labels into the project’s assets folder. Mine were at /tmp/output_graph.pb and /tmp/output_labels.txt.

Next, open up ClassifierActivity, which can be found in:

tensorflow/examples/android/src/org/tensorflow/demo/ClassifierActivity.java

You’ll want to update the constants at the top of the file to define the settings for our new model. It looks like this when you first open it:

private static final int INPUT_SIZE = 224;private static final int IMAGE_MEAN = 117;private static final float IMAGE_STD = 1;private static final String INPUT_NAME = "input";private static final String OUTPUT_NAME = "output";

private static final String MODEL_FILE = "file:///android_asset/tensorflow_inception_graph.pb";private static final String LABEL_FILE ="file:///android_asset/imagenet_comp_graph_label_strings.txt";

Change it to:

private static final int INPUT_SIZE = 128;private static final int IMAGE_MEAN = 128;private static final float IMAGE_STD = 128;private static final String INPUT_NAME = "input";private static final String OUTPUT_NAME = "final_result";

private static final String MODEL_FILE = "file:///android_asset/output_graph.pb";private static final String LABEL_FILE ="file:///android_asset/output_labels.txt";

Hit run to build the project and load the APK on your device, and you’ve got your very own road / not road classifier!

The Results

Here’s a video of my road / not road app in action. I made a few tweaks to the UI to make it easier to see what’s going on:

So how fast is it, and what about CPU usage?

On my Xiaomi Mi5, this thing flies. It does inference in about 55ms, which is 18 frames per second! On a $300 Xiaomi Mi5s!

It is quite CPU intensive, though, using 25–30% when running at full-throttle. Makes sense, since we’re running it as fast as it will go.

MobileNEt CPU monitor, running at 18fps.

We want to get down to under 5%, which we’ll do by reducing the frequency at which it runs, since our use case doesn’t need to run inference continuously to achieve our privacy objective discussed above. Changing it to classify once every 18 frames (so once a second, roughly) brings the average usage down to about 5.5%!

MobileNet running at 1fps.

So our MobileNet model is 1/30th as large as Inception. It runs more than 3x faster per frame. And it uses far fewer CPU resources.

It’s safe to say me and MobileNets are going to be like like:

Enjoyed this post? Help others find it by hitting the little ❤. Thanks for reading!