Building a FastAPI OCR Microservice

Sometimes, you need to set up an OCR microservice, which accepts images and returns text. In this article, I will try to explain basic ideas on how to create your own OCR service for free, using python, fastAPI, tesseract, redis, celery and docker.

What is microservice architecture?

This is one of the variations of server architecture, that’s based on the smallest independent services (web apps), that interact with each other using protocols based on SOAP, REST, GraphQL, and RPC. In our microservice, we will use REST architectural style in microservice communications with one main microservice node (fastAPI_service), but keep in mind, it is not the best option to do it.

Our microservice architecture.

In our OCR service, we will have 9 microservices with orchestration architecture design, where we have one main microservice that communicates with others. I will show you how to create each service and make it work with others.

FastAPI endpoints.

FastAPI service will be an entry point to our OCR service. All communication will be there. It has python + fastAPI + uvicorn. It will have 3 endpoints. POST /api/v1/ocr/img endpoint, GET /api/v1/ocr/status/{task_id} and GET /api/v1/ocr/text/{task_id}.

We are going to use POST /api/v1/ocr/img endpoint to upload images. Also, it will start a task and give to us the task id. Using GET /api/v1/ocr/status/{task_id} we will get the status of our task (ocr process is a considerably “heavy” task and will take some time to execute), after receiving the success status, we are going to use and GET /api/v1/ocr/text/{task_id} endpoint to see the final results.

First, let’s create a project folder and put this file structure:

For each service, we will have our own folder, I use VS Code to write code, and for creating virtualenv I use pipenv package. For local testing, outside Docker, I will run services in virtual environments.

Let’s look at main.py.

from fastapi import FastAPI
from app.routers import ocr

app = FastAPI()

app.include_router(ocr.router)

@app.get("/")
async def root():
    return {"message": "Hello Hackernoon.com!"}

Here, we define only one root GET endpoint. It returns a simple JSON message. Also, we include the OCR router, where we have a list of endpoints, that belongs to OCR. It is a good practice not to put all ~~eggs in one basket~~ endpoints to one file, because it will be overloaded and not easy to understand. Try to divide your code into small logically independent pieces and connect it with short lines of code in one main file. Let’s have a look at our OCR router, which we have included in the main FastAPI router.

from fastapi import APIRouter
from model import ImageBase64

router = APIRouter(
    prefix="/ocr",
    tags=["ocr"],
)

@router.get("/status")
async def get_status():
    return {"message": "ok"}

@router.get("/text")
async def get_text():
    return {"message": "ok"}

@router.post("/img")
async def create_item(img: ImageBase64):
    return {"message": "ok"}

In ocr.py file we have a router that contains three endpoints: GET /ocr/status, GET /ocr/text and POST ocr/img endpoint. Also, we import data model for POST endpoint. I have defined some simple logic to test endpoints. We can start our service via cmd.exe.

“uvicorn app.main:app –reload”

Uvicorn is an ASGI web server implementation for python. It starts main.py file. – reload means that if we change some code inside files, uvicorn will automatically rerun new code.

Basic testing.

For testing our endpoints, we will use thunder client in VS Code. First, we will check our endpoints. GET http://127.0.0.1:8000 and GET http://127.0.0.1:8000/api/v1/ocr/status endpoint.

There are both working fine, next we need to write some logic. We will receive our image in base64 string format and get generated task_id. Using this task_id we will go to GET /ocr/status and receive our OCR processing status, we will have three types of status: pending, success, and error. After receiving the success status, we will get a text from GET /ocr/text endpoint, using our task_id.

To get the status and monitor it, we will use the redis package. To execute the parallel process, we are going to use the celery package. Let’s install docker and write some code.

Installing Docker.

Go to https://www.docker.com/ and install Docker Desktop. After installation, in case If you are on Windows, you also need to install wsl package and get a Linux system image. Detailed instructions are described at https://docs.docker.com/desktop/install/windows-install/.

After successful installation, run docker desktop and you will see this window.

Setting Up FastAPI container.

Now, we need to create Dockerfile, that will have all requirements docker commands to run our fastAPI service inside docker environment.

FROM python:3.11
WORKDIR /app
RUN apt-get update && apt-get install -y && apt-get clean
RUN pip install –upgrade pip
COPY ./requirements.txt .
RUN pip install  -r requirements.txt && rm -rf /root/.cache/pip
COPY . .

In short, we create a working directory named “app” and put requirements.txt to it, installing all packages from this file using python 3.11. After everything, we remove pip cached data and copy our all files to the working directory inside a container. From this, our first entry point container is ready. Now we need to create one more file – docker-compose.yml. Briefly, it is a simple file, that contains all docker commands that build, deploy and execute all containers together in a simple one-line command.

version: '3.8'
services:
  web:
    build: ./fastapi_service
    ports:
      - 8001:8000
    command: uvicorn app.main:app --host 0.0.0.0 –reload

Now, we have an updated project structure.

We are ready to containerize your first fastAPI service. To do this, we need to write docker-compose commands. It creates and starts all containers.

Run cmd.exe inside a folder that contains adocker-compose.yml file with docker-compose up --build command and have a look.

Now, our fastAPI service is running inside docker. We have changed the port number to 8001 via a docker-compose file.

Building everything.

First, clone https://github.com/abizovnuralem/ocr. You will see this project structure.

I have separated each app to own folder with its own instance of celery and redis. I have decided to use 3 instances of redis and celery to make each microservice independent. These apps have own dockerfile, where we install a virtual system and all required packages from requirements.txt. Each app contains a main.py file, its own entry point, and tasks.py, where we execute celery tasks. The routers folder contains endpoints that help to communicate between containers through REST API protocol.

FastAPI service.

The main logic of all our services is in the fastapi_service/app/tasks.py file. It orchestrates all processes, receives images, does some preprocessing stuff, and starts the tesseract recognition process.

import os
import time
import requests
import json
from routers.ocr.model import PreProsImgResponse
from celery import Celery

app = Celery('tasks', broker=os.environ.get("CELERY_BROKER_URL"))

def check_until_done(url):
    attempts = 0
    while True:
        response = requests.get(url)
        if response.status_code == 200 and response.json()['task_status'] == "PENDING" and attempts < 60:
            time.sleep(1)
            attempts+=1

        elif response.status_code == 200 and response.json()['task_status'] == "SUCCESS":
            return True
        else:
            return False

def convert_img_to_bin(img):
    response = requests.post(url = "http://img_prepro:8000/api/v1/img_prep/img", json={"img_body_base64": img})
    task = response.json()
    if check_until_done("http://img_prepro:8000/api/v1/img_prep/status" + f"/{task['task_id']}"):
        url = "http://img_prepro:8000/api/v1/img_prep/img" + f"/{task['task_id']}"
        response = requests.get(url)
        return response.json()['img']
    raise Exception("Sorry, something went wrong") 

def get_ocr_text(img):
    response = requests.post(url = "http://tesseract:8000/api/v1/tesseract/img", json={"img_body_base64": img})
    task = response.json()
    if check_until_done("http://tesseract:8000/api/v1/tesseract/status" + f"/{task['task_id']}"):
        url = "http://tesseract:8000/api/v1/tesseract/text" + f"/{task['task_id']}"
        response = requests.get(url)
        return response.json()['text']
    raise Exception("Sorry, something went wrong") 


@app.task(name="create_task")
def create_task(img: str):
    try:
        bin_img = convert_img_to_bin(img)
        text = get_ocr_text(bin_img)
        return text
    except Exception as e:
        print(e)
        return {"text": "error"}

We use only two lines of code, to execute everything in our service.

bin_img = convert_img_to_bin(img)

text = get_ocr_text(bin_img)

It does some image preprocessing work that helps tesseract to recognize images more accurately and faster via REST API using img_prepro microservice and starts the tesseract engine inside tesseract_service via REST API and gets the final results.

Testing.

To perform testing, we will use this image.

First, we need to convert it to a base64 string. We are going to use https://codebeautify.org/image-to-base64-converter get an image string and then with help of POST /http://localhost:8001/api/v1/ocr/img endpoint we will get the task_id.

Using GET /http://localhost:8001/api/v1/ocr/status/2591ec33-11d2-4dec-8cf4-cea15e05517e

we are monitoring the task execution status, after receiving SUCCESS status, we will get a text from

GET /http://localhost:8001/api/v1/ocr/text/2591ec33-11d2-4dec-8cf4-cea15e05517e

Conclusion

We have created 9 microservices in one OCR service. Set up all containers to run and communicate with each other. Some things we need to do in the near future:

Logging;
Testing;
Monitoring;
Improve OCR recognition;

The main idea of this article was to show you how to create a simple microservice architecture, that performs basic OCR recognition. Thanks for your attention!