It's not enough to create a high-quality and useful mobile app that meets the needs of users in a particular niche. Constantly working on new versions of an app is something that all developers face. Our iScanner team is no exception.
The iScanner neural network training is largely based on the OCR (Optical character recognition) algorithm. The OCR algorithm needs data to improve its functionality. To be more specific, the algorithm needs a lot of data that it processes, learning to recognize new text variants (since iScanner is a scanning and document management platform, it is one of our main tasks).
Imagine you have the task of recognizing handwritten text. There are hundreds of variations of handwriting in a multitude of languages. No matter how advanced typing technology becomes, people still write by hand: they take notes at work meetings, jot down roadmaps, and even write their daily tasks in a notebook because they are used to doing so. But then one day, the moment comes when this text needs to be urgently digitized. And that’s when iScanner comes to the rescue by digitizing your notes and saving important documents and notes on your phone.
But how does this happen on the technological level? And how does our team train the OCR algorithm to recognize different types of handwriting?
As we said earlier, the OCR algorithm needs data. And for extremely complex tasks like handwriting recognition, it needs lots and lots of data. The more handwriting samples you provide to the algorithm, the smarter it gets. If we think about how many variations of handwritten characters there are in just one language, the whole task starts to seem impossible.
Why is handwriting recognition difficult? The problem is that there is a wide range of handwriting variants.In addition to this, the way the characters are connected when writing by hand becomes a critical variable for handwriting recognition. This significantly complicates the task and limits the ability to generate synthetic datasets based on a small number of characters. Therefore, it is difficult for programmers to provide enough examples of what each character might look like. In addition, sometimes characters look very similar, making it difficult for computers to accurately recognize them. For example, half the population writes "1" like a "7" and the other half writes "7" like a "4"! Also, notes can be written on the fly, and captions are almost always illegible. Some words may also be unfinished, and the author may not understand them without context.
The same characters can be written in a variety of ways, and one must also take into account the angle of the text, which depends on the hand used while writing, as well as the sloppiness and illegibility of the handwriting. Also, in addition to the uniqueness of the characters in each handwriting, there is the problem of the way the characters are connected.
Another complex problem is that often the combination of two characters can look like a third character, for example, rn is very similar to m. On top of that, the original documents can be of poor quality, as the paper deteriorates quickly.
In general, however, recognition accuracy depends on the richness of the data set used to train the algorithms.
Success here is highly dependent on the data set used to train the algorithms. Our team understood that we needed to get as many variants of handwritten text as possible. How can we achieve this? Ask for help from the app users!
I want to note that we have always paid close attention to working with user experience. This time, we asked active iScanner users to help us make the app better and send us their handwritten notes. We announced the Handwriting Challenge right in the app and on the iScanner website, and we've already achieved some results. Users have sent us study notes, book extracts, notepad notes, etc.
By the way, such user activity does not always go smoothly, and the developers need to take this into account before starting these kinds of challenges. In our case, despite the detailed instructions, users did not always scan handwritten text and often sent us scans of typed text instead.
Another difficulty was that users were not always careful when selecting words, so additional verification and processing of the resulting user datatable was required. In conclusion, this is what I can recommend based on our experience of cooperating with users to improve a new feature in the app: you can use this technique, but always keep in mind the difficulties I’ve listed and don't expect your users to follow the instructions perfectly. Even the most experienced iScanner users are not developers who know all the ins and outs of training a neural network.
We have been and continue to collect data to train the iScanner app's own neural network to take the OCR feature to the next level. And yes, we could hardly implement this feature and get better handwriting recognition results without our users and their interest in improving the app.
With this example of user interaction, we want to emphasize how important it is to use the feedback of our users to further train AI. Don't underestimate users’ contribution to the app, but don't overestimate it either (for example, don’t expect them to do everything smoothly and correctly). Apart from the obvious advantages, this challenge is also an opportunity for the community of active iScanner users to feel involved and important.