paint-brush
Creating a Wrapper for Tesseract is Several Times Faster Than PyTesseractby@nuralem
5,793 reads
5,793 reads

Creating a Wrapper for Tesseract is Several Times Faster Than PyTesseract

by Nuralem Abizov6mOctober 31st, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The basic idea is to use python’s built-in multiprocessing features to split documents into separate pages and run multiple tesseract engine instances for parallel page recognition. Tesseract uses one core to recognize images, in average cases, it will be enough, but if you have “heavy” documents, that have many sheets, it would be very slow. The technology is called OCR (Optical Character Recognition) One of the most popular and free OCR software is free and open source.

Company Mentioned

Mention Thumbnail
featured image - Creating a Wrapper for Tesseract is Several Times Faster Than PyTesseract
Nuralem Abizov HackerNoon profile picture
Nuralem Abizov

Nuralem Abizov

@nuralem

Software engineer

0-item
1-item

STORY’S CREDIBILITY

Original Reporting

Original Reporting

This story contains new, firsthand information uncovered by the writer.

Code License

Code License

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

L O A D I N G
. . . comments & more!

About Author

Nuralem Abizov HackerNoon profile picture
Nuralem Abizov@nuralem
Software engineer

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite