6,270 reads

Machine Learning for Android Developers with the Mobile Vision API — Part 3 — Text Detection

by Moyinoluwa AdeyemiNovember 21st, 2016

Too Long; Didn't Read

Incase you missed it, here are the prequels to this article about the Mobile Vision API. <a href="https://hackernoon.com/machine-learning-for-android-developers-with-the-mobile-vision-api-part-1-face-detection-e7e24a3e472f#.lludc6dvq" target="_blank">The first post</a> was on the Face Detection API while <a href="https://hackernoon.com/machine-learning-for-android-developers-with-the-mobile-vision-api-part-2-barcode-detection-61e84c858518#.ln7czcra0" target="_blank">the second</a> was on the Barcode Detection API.

Companies Mentioned

featured image - Machine Learning for Android Developers with the Mobile Vision API — Part 3 — Text Detection

Incase you missed it, here are the prequels to this article about the Mobile Vision API. The first post was on the Face Detection API while the second was on the Barcode Detection API.

Text Detection API

According to the overview, the Text Detection API allows for detecting text in images and videos and it breaks down those texts into blocks (paragraphs/columns), lines (sets of words on the same vertical axis) and words (set of alphanumeric characters on the same vertical axis). The API recognizes text in various Latin based languages.

Potential applications

I’ll write about what’s possible with this API before I go ahead to explain how to use it .

Imagine you were invited to attend the Firebase Dev Summit in Berlin and you got all excited, but you didn’t know a word of German. The only foreign language you understand (apart from English) is some Spanish you picked up on Duolingo two years ago. Tough huh? How would you communicate if all the signs were in German? Typing the text into Google Translate ALL THE TIME was going to be out of the question because you know German words have a reputation for being unusually long. An option would be to have some sort of way to detect any text you want translated and be able to receive translations in your preferred language immediately. The Text Detection API will help with detecting the text but the Google Translate API will be used for the translations.
Another possibility is converting a very large amount of text (e.g. from a book) into digital format. The traditional method would be to scan each page, which might damage the book. With the Text Detection API, all that’ll be needed is a device with a camera for focusing on the text and maybe some API that uploads the recognized text to a server.

Getting started

Here, we are going to detect text from a default image preloaded in an app using the Text Detection API. I initially wanted to take this a step further by translating that text into a specified language as described in the scenario above but I left that part out when I discovered the Google Translate API is billed per usage.

Here we go (again)…

Create a new project in Android Studio.
Import Google Play Services SDK for the Mobile Vision API into your app level build.gradle file. As at the time of writing this article, the latest version is 9.8.0. You are bound to hit the 65k method limit if you import the whole SDK instead of the specific one (play-services-vision) you need.

compile 'com.google.android.gms:play-services-vision:9.8.0'

To enable the vision dependencies to be automatically installed for text detection, add this meta-data to the manifest file. This ensures the appropriate libraries are downloaded in time for first time users of the app.

We’ll create a very simple layout consisting of a Button, an ImageView and a TextView. The ImageView loads a cute image of a cat with some German text from the drawable folder. The button starts the processing of the image while the TextView displays whatever text is detected from the image.
Just like the Face Detection and Barcode Detection API, the image has to be converted into a Bitmap to be processed.

Bitmap textBitmap = BitmapFactory.decodeResource(getResources(), R.drawable.cute_cat_image);

The Text Recognizer is initialized to process the image already present in the ImageView.

TextRecognizer textRecognizer = new TextRecognizer.Builder(this).build();

Next, we need to check if the text recognizer is operational already. There’s always the possibility that it won’t work the first time because a library needs to be downloaded to the device and it might not have been completed in time for use.

if (!textRecognizer.isOperational()) {new AlertDialog.Builder(this).setMessage("Text recognizer could not be set up on your device :(").show();return;}

We then create a frame using the text bitmap and call the text recognizer.

Frame frame = new Frame.Builder().setBitmap(textBitmap).build();SparseArray<TextBlock> text = textRecognizer.detect(frame);

Our results are now contained in the SparseArray which is then displayed in the TextView.

for (int i = 0; i < text.size(); i++) {TextBlock textBlock = text.valueAt(i);if (textBlock != null && textBlock.getValue() != null) {detectedText += textBlock.getValue();}}detectedTextView.setText(detectedText);