Entering data and moving it from one place to another is a time-consuming, repetitive task. One employee can easily spend up to three hours a day just moving data around. In addition to eating up workers’ time, manual data handling is prone to errors, which lead to revenue losses. A report by Dun & Bradstreet, investigating the past and future of data, revealed that one in five businesses lose money due to incomplete data.
Optical character recognition (OCR) technology can help businesses solve these issues. OCR algorithms can transform paper-based documents into editable searchable text. They can also extract information from files and enter it into the corresponding fields in a company’s IT systems.
So, how does OCR work? How can this technology help you achieve business goals? And should you contact an artificial intelligence solutions provider to help you build and set up OCR software?
Optical character recognition is a technology that converts typed or handwritten text and printed images containing text into a machine-readable digital data format. OCR algorithms help turn large amounts of paper documents into digital files, facilitating text storage, processing, and searching.
OCR systems consist of hardware and software. The hardware part can be an optical scanner or a similar device that can convert paper documents to the digital format. The software part is the OCR algorithm itself.
It is hard for computers to recognize characters because of the different fonts and variations on how one letter can be written. Handwritten letters complicate matters even further. Nevertheless, optical character recognition algorithms take on this challenge. Every OCR solution operates in four main steps:
The process involves using an optical scanner to capture a digital copy of the paper document. The document has to be properly aligned and sized.
The goal of this phase is to make the input file usable by the OCR algorithm. The noise and background are eliminated. Pre-processing includes the following steps:
Layout analysis: identifying captions, columns, and graphs as blocks
De-skew: tilting the digital document to make lines horizontal in case it wasn’t properly aligned during scanning
Image refinement: smoothing the edges, removing dust particles, increasing the contrast between text and background
Text detection: some algorithms detect separate words and divide them into letters, while others work with the text directly without splitting it into characters.
https://www.youtube.com/watch?v=cAkklvGE5io
During this phase, optical character recognition algorithms perform different manipulations to recognize letters and numbers. There are two main approaches:
After detecting characters, the program converts them to American Standard Code for Information Exchange (ASCII) to facilitate further manipulations.
The output can be basic, like a character string or a file. More advanced OCR solutions can retain the original page structure and create a PDF file with searchable text. Even though there are no tools so far that will guarantee 100% accuracy on different input files, some optical character recognition algorithms can achieve an impressive accuracy of 99.8% on familiar texts. Using handwriting will significantly compromise the results. Also, it's important to understand that with poor training or unfamiliar texts the error rate can be as high as 20%. Hence, it's necessary for users to constantly monitor, proofread, and correct OCR algorithms’ output, especially when a new type of document enters the pipeline.
The post-processing phase can also involve natural language processing (NLP) and other AI techniques for data verification. AI can not only correct the text but also catch mistakes in calculations. Let’s assume that while processing an invoice, an OCR algorithm identified the total sum to be $500. AI can verify this by adding all the expenses and figuring out that they don’t amount to $500. AI can notify a human employee to review this particular case.
If you want to improve the algorithm’s quality, you can experiment with open-source OCR libraries, such as Tesseract, that use their own dictionary for character segmentation. Another approach is to create a specialized glossary of terms reoccurring in your domain. Also, reviewers can use their feedback as an input to another optical character recognition algorithm training session.
Here is what optical character recognition solutions can do for you:
If you are thinking about incorporating OCR features into your IT systems, you’ve got several options to choose from.
There are several open-source OCR algorithms that businesses can adapt to their needs. These solutions are easier to customize as their source code is universally accessible. However, there is no central authority. Developers of open-source solutions don’t assume responsibility and don’t offer further support. Hence, the code’s quality can be questionable. This option is more suitable for companies with strong IT departments capable of fixing any malfunction. Alternatively, you can reach out to machine learning consultants who can customize and retrain this software for you.
Here are some commonly used open-source OCR solutions:
Tesseract open-source engine is one of the most popular OCR tools, and it is believed to be among the most accurate free tools. It was developed by Hewlett-Packard between 1985 and 1994. Starting from 2006, this platform was managed and further developed by Google. Tesseract is written in C++, but it offers wrappers in Java, Python, Swift, Ruby, R, and a few more common programming languages.
The tool operates using a command line and doesn’t have a graphical user interface. However, there are several GUI options that you can deploy to make this solution user-friendly. One example is glmageReader. This interface is developed using Python and supports different image formats, including PNG, GIF, and PNM.
Tesseract doesn’t offer page layout analysis, doesn’t format the output, and its command line interface requires all images to be submitted in TIFF format. Additionally, this OCR solution is not optimized for GPU and doesn’t allow batch processing.
OCRopus was originally written in Python and now has a separate C++ version. It is supported by Google and was used as an OCR engine for the Google ReCaptcha algorithm.
OCRopus has three main features:
Jaided AI, an optical character recognition company, built EasyOCR package using Python and PyTorch library with its deep learning models. It supports over 80 languages, including Cyrillic scripts, Chinese, and Arabic, and this base keeps expanding. As a part of the implementation roadmap, there are plans to add configurable options for recognizing handwritten text.
Software as a service (SaaS) solutions allow you to benefit from high-quality algorithms and receive full vendor support. Depending on the selected platform, you might be able to retrain the OCR algorithm on your dataset and even further adapt it to your unique needs.
Amazon Textract is a machine learning-based service that extracts printed and handwritten text from scanned documents. It can work with unstructured data and with formatted text, such as forms and tables. The solution uses AI and doesn’t need any extra configuration steps or templates. This service is secure and compliant with data protection regulations, such HIPAA and GDPR.
Amazon Textract offers four APIs that customers can use and pay for accordingly:
Google offers Vision API, which can extract printed and handwritten text from documents and images. It contains two features for optical character recognition:
Both features allow users to process the first 1,000 units per month for free. After that, you will pay $1.5 per each 1,000 units. This price will decrease as you submit more units per month.
Microsoft offers OCR services as a part of its generic computer vision API, not as a stand-alone feature. So, you pay for the whole package, which, in addition to optical character recognition, includes identification of celebrities, landmarks, brands, and general object detection. This API will cost you $1 per 1,000 transactions for the first million units. Afterward, the price decreases to $0.65 per 1,000 transactions, and will keep declining as you submit more content.
Optical character recognition algorithms are gaining traction in different industries. Below are some of the most prominent OCR applications.
Banking institutions use loads of paper-based documents in their workflows. These include cheques, customer records, loan applications, bank statements, etc. Adopting OCR recognition algorithms allows employees to store and access all these documents digitally and prevents paperwork loss and damage.
Check handling
One example of OCR in this sector is using banking apps to deposit paper-based checks digitally. These solutions deploy optical character recognition algorithms to identify relevant fields in checks and perform operations accordingly without the need for an employee to transfer all this data manually. Additionally, such apps can perform signature validation against the existing database and clear the check immediately.
Customer onboarding
Instead of having an employee verify clients’ identity manually, OCR-powered solutions can extract and validate all relevant information from the person’s passport and other ID documents. This allows for instant verification and improves the customer experience.
Client information updating
Instead of having to visit or call a bank, with the help of OCR, clients can scan their documents to update information automatically. For example, Alfa-Bank collaborated with Smart Engines to enhance their banking app with optical character recognition capabilities. With this new feature, customers can place ID documents in front of their smartphone’s cameras, confirm the extracted data, and update their information in the banking system.
Similar to the banking sector, healthcare organizations accumulate many paper documents, such as X-ray scans, test results, treatment plans, and so on. OCR algorithms help digitize these files to prevent loss of physical documents and reduce efforts wasted on handling paper files manually. Additionally, some OCR solutions that recognize handwritten text can process patient enrollment papers and prescriptions.
Medical claims system
There are software vendors who specialize in OCR-enabled medical claim processing. One such company is OCR Solutions. It developed a product that can scan, verify, and correctly route medical claims for further handling. This program is trained and configured to work with common formats, such as Dental Claim Forms and CMS-1500, among others.
Fax
Many medical facilities still rely on fax. Optical character recognition solutions can convert incoming material into an accessible, digitally stored format.
Invoicing
OCR-powered solutions help healthcare organizations digitize invoices and file them correctly. One OCR example comes from San Francisco-based Nanonets, which offers an OCR-powered solution specializing in invoice processing. The company claims its software will reduce invoice data entry time from three minutes per invoice to just 30 seconds.
Optical character recognition algorithms enable retail employees to save time on processing purchase orders, invoices, packing lists, and other documents. These solutions can also extract serial numbers from products’ barcodes and enable customers to scan their vouchers and extract serial codes.
ID scanning
Store employees may need to scan personal information for many reasons, such as age verification, filling in information for customer loyalty, and more. OCR vendors capitalize on this opportunity. For instance, OCR Solutions, based in Florida, developed idMax, an OCR-powered software that can scan ID documents, extract relevant fields, and populate the retailer’s database with corresponding information. idMax can be installed locally or accessed through the cloud.
If you decided to deploy OCR recognition algorithms to improve your operations, there are several aspects that you need to consider:
Input material: make sure all input files are suitable for the OCR algorithm. For example, the files need to be free of damage that can interfere with the algorithm’s ability to recognize its content. The contrast is high enough, the pages are properly aligned, etc. Some algorithms have powerful pre-processing capabilities and can resolve some of these issues for you. But if this is not the case, maybe it’s a good idea to invest in a high-quality scanner and ensure proper page alignment.
Training dataset: if you decide to train or retrain optical character recognition algorithms, you need to make sure the data you plan to use faithfully represents your input material and contains enough correct annotations. If your training dataset is too small, or does not contain adequate annotations, the algorithm will not produce desired results.
Also, during training, you need to pay special attention to similar characters/symbols. For example, numbers 2 and 7 may look rather similar, especially if the algorithm is expected to work with handwritten text. Data scientists need to cover such distinctions in the training data. Another example can be using OCR algorithms to detect and capture license plates on cars. You need to make sure your algorithm doesn’t go for a custom sticker with text on the back of a car, mistaking it for a license plate.
Handwritten text: with handwriting come numerous additional OCR challenges. There is a large variety of writing styles among different people; even individual users’ writing can be inconsistent. Gathering a reliable representative training dataset is a challenge as you need to account for all the different styles. Cursive handwriting is particularly challenging to process. Also, while printed text comes in a straight line, handwriting tends to have variable rotations, which complicates matters even more.
Scaling: if you increase the number of users or the number of requests per time slot, the system can collapse, especially if you are using an open-source solution and relying on your own computing power. In case of commercial OCR products that run in the cloud, you can arrange and pay for more capacity.
OCR algorithm’s performance monitoring: after deployment, the algorithm’s performance might start degrading due to different factors. One example is the change in distribution between the training data and the actual production data. This occurs when the model starts working on datasets it wasn’t prepared for, such as different fonts or characters with unusual inclines. These changes will affect the model’s output over time, and you need to detect these issues and retrain the model accordingly to maintain its initial accuracy level.
Optical character recognition algorithms have the potential to speed up your business processes. However, there are associated challenges to consider. The selected algorithm is likely to need retraining, and it’s a tedious task to properly annotate a large dataset. You also need to think about potential scaling as your business expands. Adopting an open-source solution seems tempting price-wise but it comes with its disadvantages, such as lack of support and updates, which can open security loopholes. Commercial solutions are more reliable in this regard but can be costly and hard to customize.
If you are unsure of how to proceed and which OCR solution is the best fit for your business, don’t hesitate to reach out. At ITRex, we will be happy to conduct a thorough evaluation of your business needs to determine the best OCR option. We can also help you retrain the selected solution and integrate it into your system. We can also build a custom OCR algorithm, if needed.
Do you want to speed up your operations with optical character recognition? Drop us a line! Our AI experts will assist you with OCR solution integration and training. We can also develop custom algorithms for you, if needed.