Shrink the model size and reduce the computational resources needed to do the inference calculations
So you are interested in running a machine learning model on your phone, here is a quick guide on how you could do so and some of the challenges you would face along the way.
Google’s Inception model is quite huge (by mobile standards), it is about 90 MB. One reason the model is that big, is because it stores weights as 32-bit floating point. That is quite essential during the training phase to allow applying many tiny changes to the weights. But after the training is done switching to 8-bit fixed point will not have a huge impact on accuracy.
That is where quantization comes in, we quantize our model to shrink its size and reduce the computational resources needed to do the inference calculations. Moving calculations over to 8-bit will help you run your models faster, and use less power.
In Lester’s post, he wrote about how he retrained Google’s Inception V3 model so we can classify images based on Trade Me listings images. In this post I will cover how I ran the model Lester trained directly on iOS and Android phones.
OK, so now I have the retrained Inception model, the first thing I did was reviewing the model using TensorBoard, get a feel of how complex it is! And finding out the name of the input and output layers. I did that by running the following commands
python tensorflow/python/tools/import_pb_to_tensorboard.py --model_dir tmp/tensorflow_inception_graph.pb --log_dir tmp/
And then by going to http://mac.local:6006 you can review the model layers
After that stripping the model was as simple as running the following command
—-inputs=”input_1" —-in_graph=tmp/tensorflow_inception_graph.pb \
—-outputs=”output_node0" —-out_graph=tmp/quantized_graph.pb \
—-transforms=’add_default_attributes strip_unused_nodes(type=float, shape=”1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights strip_unused_nodes sort_by_execution_order’
Now that we have the quantized graph, to run it on iOS we can just replace the one in https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios/camera/data. Do not forget to add the labels text file too.
Last change you need to make is to change the fields in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/ios/camera/CameraExampleViewController.mm#L38 to match what your model expects as an input.
In our case it was
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 0.0f;
const float input_std = 255.0f;
const std::string input_layer_name = "input_1";
const std::string output_layer_name = "output_node0";
Running it on Android is very similar too, add your quantized graph and labels there https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android/assets. And update the lines in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/ClassifierActivity.java#L61 to match what your input expects.
In a coming post I will cover how you can embed the quantized graph from the last step into your existing iOS and Android apps.
Would you be interested in learning more about this https://leanpub.com/ml-mobile