paint-brush
Using the Google Optical Character Recognition APIby@filestack
794 reads
794 reads

Using the Google Optical Character Recognition API

by FilestackJune 17th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Every company is searching for a competitive advantage when conducting their business processes. Machine learning allows computers to grow and evolve to best meet the needs of a particular business model. OCR, especially Google OCR API, is one of the most recent advancements in machine learning. It uses a simple REST call to recognize and obtain text from images for additional processing or storage. In this article, we will discuss the Google Ocr API in this article. We will also discuss how to set up and use the FileStack API.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Using the Google Optical Character Recognition API
Filestack HackerNoon profile picture


Every company is searching for a competitive advantage when conducting its business processes, whether marketing, collecting data to analyze sales, or order fulfillment.


So they tend to adopt various technologies to carry out these tasks more efficiently. This high dependency on technology has created a demand for smarter and more powerful computers than ever before.


This need has led to the emergence of machine learning. Machine learning allows computers to grow and evolve to best meet the needs of a particular business model.


There are various types of machine learning, such as clustering and evolutionary computation. Optical character recognition (OCR), especially Google OCR API, is one of the most recent advancements in machine learning.


The OCR API is a valuable computer vision tool. It uses a simple REST call to recognize and obtain text from images for additional processing or storage. In this article, we will discuss the Google OCR API.

What is the Google OCR API?

The Google OCR API is a subset of the Google Cloud Vision API. We can use Google OCR API to extract text from JPEG, GIF, PNG, and TIFF images. A number of Google products use this OCR technology, including Gmail and Google Drive.


However, you can also use it as an API to produce text from images inside your NLP-powered automated applications. In a nutshell, you can utilize Google OCR to build optical character recognition applications.


This API is a good option for individuals on a limited budget or large-scale applications due to its economic, powerful, and widely available nature.

What should you consider when working with the Google OCR API?

Here are some facts you need to consider when using the Google OCR API.

  • Google OCR is available in many programming languages, including JavaScript, Go, and Python.
  • You can use OCR in a wide range of languages besides English.
  • OCR is just one of the many features of the Google Vision API, including facial recognition, explicit content tagging, landmark detection, and image labeling.
  • Google OCR is not expensive unless you use it on a large scale.

How does the Google OCR API work?

How Google OCR API works

Source – https://cloud.google.com/functions/docs/tutorials/ocr


Optical character recognition examines a still picture or frames from a movie to find shapes representing characters and punctuation. Once the OCR has detected these patterns, artificial intelligence is employed to “read” them in the same way a human would by considering the context, such as the surrounding words.


OCR suites must learn various languages since contextual, natural language processing is required for reliable OCR. In reality, an OCR analyzes a given image and converts the text into a readable format that can be preserved.

How to use the Google OCR API?

The concept of the OCR API is simple and straightforward.


  1. You essentially transmit an image to the Google Cloud Vision API (from a remote location or your local storage).
  2. According to the function you call, the image will be processed remotely on Google Cloud and provide the relevant JSON formats.
  3. The output of the invoked function is a JSON file.

Setting up the Google OCR API

You must first set up the Google Cloud Console and go through many authentication processes to utilize any Google Vision API service. Below is a step-by-step guide to setting up the Vision API service.


  1. Create a Project in the Google Cloud Console

  2. Enable Billing

  3. Create a Service Account

  4. Set Up Environment Variables ‘GOOGLE_APPLICATION_CREDENTIALS’

  5. Code blocks for Mac/Linux

  6. Code blocks for Windows


Google OCR API supports many programming languages, including Java, Python, Node, and Google’s own Go. Here we have included a primary calling function using Python that can be done easily.


def detect_text(path):

“””Detects text in the file.”””

from google.cloud import vision

import io

client = vision.ImageAnnotatorClient()

with io.open(path, ‘rb’) as image_file:

content = image_file.read()

image = vision.Image(content=content)

response = client.text_detection(image=image)

texts = response.text_annotations

print(‘Texts:’)

for text in texts:

print(‘\n”{}”‘.format(text.description))

vertices = ([‘({},{})’.format(vertex.x, vertex.y)

for vertex in text.bounding_poly.vertices])

print(‘bounds: {}’.format(‘,’.join(vertices)))


Simply put, the procedure calls the “text annotation” function, which will then extract the responses and print them out. The same method can be used to obtain dense texts with “document_text_annotation.” Images can also be detected remotely by configuring the image via ‘image.source.image_uri = uri’. There, the URI is the URI of the image.

Why do companies use OCR APIs today?

Here are a few examples of how five main industries utilize OCR APIs.


Healthcare Industry – The OCR API saves time by automatically transcribing clinical paperwork, past medical history, recommended drugs, and other information. AI-based OCR technologies can also be used to filter and convert prescription slips, laboratory notebooks, and clinical test datasets into digital file formats for safe health record management.


Financial Institutions – OCR technology is useful in retail and supply chain businesses to retrieve commodities and prices and company information from expenditures, invoices, and receivables. It has a 95% accuracy rate in recognizing invoice layouts and removing functional parts.


Banking Industry – OCR APIs can process cheques, card swipers, financial information, KYC paperwork, and other documents. Banks use OCR APIs to analyze financial data, check account balances, and verify fund transfers.


Legal system – OCR APIs can be used to transcribe affidavits, judgments, and filings, as well as other documents, making data searching easier.


Supply Chain Industry – OCR APIs can help with processing shipment details, receipts, and customer orders. These APIs let you collect key-value pairs, check tax rates and balances, and cut back-office costs by up to 50%.

What does FileStack offer?

There are some cons to using Google OCR API. It can be difficult to learn, and it’s difficult to get support from a company like Google. There are better OCR API solutions available in the market with greater productivity and simplicity to use. Filestack OCR API is one of the best OCR API when it comes to efficiency. The Filestack OCR API can assist you in interpreting, extracting, and organizing data. You can learn more about them from their documentation.


Moreover, it reduces data extraction errors and improves the efficiency of data collection. Filestack OCR API has the best SDK that supports Javascript, Ruby, PHP, Python, Swift, and Android. This API works on tax documents, cards, IDs, and bills apart from photos.


Furthermore, FIlestack’s OCR API allows you to convert image attributes character by character into customized identification codes, eliminating the need for human data processing. You can find different packages available for you from this page.