1st-hand & in-depth info about Alibaba's tech innovation in AI, Big Data, & Computer Engineering
Smart AI applications can help businesses audit complex international trade documents, significantly benefiting work efficiency and risk control.
Documents are a crucial part of international trade, as well as notoriously complex. For trading staff who routinely deal with large amounts of these documents in formats such as scanned images, it makes or breaks their ability to effectively process key information and control risks. Now, with Smart Audit technology that processes information even from crumpled documents, Alibaba is showing how AI solutions can improve efficiency and reduce pain points in international trade.
International trade is a set of complex procedures, especially for B2B trade. To control risks, cross-validation of multiple documents and document-based risk control strategies are implemented at every step. Examples include the cross-validation of corporate information and bank accounts, audit of risks related to letters of credit, B/Ls, insurance policies, packing lists, invoices, and customs declaration forms.
Consider the example of the letter of credit. For this type of document, various clauses must be audited and documents must be mutually consistent and conforming to the terms of the letter of credit. This process is typically carried out by highly skilled and experienced people.
This lengthy, risky audit process is the main driver for the development of Alibaba’s Smart Audit technology. Using machine learning and artificial intelligence, Smart Audit processes documents in a more efficient, less costly, and less risky way.
The benefits of such technology are numerous. Smart Audit serves more SMEs in international trade by providing order decision reports, terms/credit/trade risk reports, and solutions for preparing and auditing documents. Moreover, it uses AI technology to reduce costs and risks, improve efficiency and customer experience, and boost the optimization and upgrade of core e-businesses.
The overall technical solution is abstracted into four parts: image processing, natural language processing, domain knowledge graph, and unified technical architecture.
· Image processing service
Documents, Card , Facial recognition
· Natural language processing
Text categorization (noise included)
Words correction and sorting
Text parsing, KV categorization
· Domain knowledge graph
Domain knowledge: Terms, Port, Cargo description
Expert strategies: Terms and conditions, Financing
Map of risks: Nation, Bank, Region, Enterprise
· Unified technical architecture
Leveraging, Innovation, Scale-out
Data storage, Decision-making engine, Model service
Monitoring and warning
When image quality is good, the existing image recognition technologies can achieve high accuracy. Unfortunately, this is not always the case. Images are usually more complex, and the direct recall of existing technologies only works well in less than 50% of cases. Moreover, common recognition technologies are weak in comprehension. For instance, even if OCR recognizes the characters on a distorted image, it cannot organize them to deliver the correct semantics; nor can it analyze and judge which parts of the images are useful.
Therefore, the bigger challenge for image processing, in addition to leveraging Alibaba’s recognition technologies, is how to properly pre-process and post-process images to suit specific needs.
Due to the varied formats of documents, and with over half of them being images, the best OCR solutions cannot keep incorrect recognition rate below 10%. Therefore, a robust text classification model is needed. Also, even a low frequency of incorrect character recognition cannot guarantee direct reading and unmanned use if there is no domain-specific optimization and word segmentation.
This means that word correction and segmentation based on domain knowledge is of utmost importance. Then, through the parsing engine for content parsing and key-value relationship reconstruction, combined with text-based domain knowledge graph and risk control strategies, semantic understanding and smart audit are completed.
The knowledge graph constructed here primarily accumulates knowledge from three fields: domain knowledge, expert strategies, and risk map. Domain knowledge includes terms, abbreviations, and port information commonly used in international trade. Expert strategies include term strategies, conflict strategies, financing strategies, and audit opinions. Risk map includes information on nations, banks, regions, and enterprises.
The domain knowledge graph is the foundation of Smart Audit. All advanced processing procedures are intended to combine with it to realize smart audit and risk control in a real sense.
The unified technical architecture is abstracted out of the existing approaches. First, all service interfaces are gathered under the unified task engine. The next job is to get the most out of Alibaba’s existing technologies and platforms, such as Leiyin (OCR technologies), Alibaba Cloud (certificate and face recognition), MTEE (real-time decision engine), and PAI (model training and deploying platform), to name a few.
Once these technologies and platforms are incorporated, efforts are focused on digging deeper into the algorithms and models to work out creative solutions for specific problems.
Some creative algorithms and models described in this article mainly regard image processing and natural language processing. It includes blur detection, unwarping, and word correction and segmentation.
Blur detection, or Image Quality Assessment, aims at reaching the target in a quick, light-weighted way. It can mean smart processing if image quality is good, or prompting to re-upload and process manually.
Many traditional solutions can deal with specific types of fuzziness. An example is the Laplacian algorithm that determines whether an image is blurred based on the threshold value obtained by calculating second derivative and variance.
Nevertheless, traditional solutions do not work well in feature extraction and representation. To overcome this restraint, this article presents an improved MobileNetV2 network structure. Blur detection requires particular attention to nuanced differences between image details. Therefore, a set of samples is generated first using random slicing and HSV color space screening, followed by dividing positive and negative samples based on the OCR recognition rate.
The original MobileNetV2 network contains 17 bottleneck layers, with each layer further scaling out. This model is large and difficult to converge in the training stage. It is thus clipped and improved into a new architecture that only contains two convolution layers, two pooling layers, two bottleneck layers, and one fully connected layer, in a shallower and narrower network with fewer parameters.
During the latest round of testing, the accuracy of this algorithm is about 93.4%. The original model’s size is only 2 MB. In comparison, the trained model using the original MobileNetV2 model is around 26 MB.
Image may distort in many ways, such as rotation, folding, and curling. These problems affect not just the OCR recognition, but also, to an even worse result, semantic understanding. Image unwarping is crucial for reaching the stage of auto audit. Many traditional methods can solve specific warping problems. For instance, Hough Transform can be used to detect the straight line, and rotate the image by the correct angle.
In recent years, deep learning models, such as FCN, STN, and Unet, have also been used to deal with image distortion. Combined with knowledge of semantic segmentation, Alibaba has proposed a new unwarping algorithm based on improvement of existing methods.
First, data synthesis technology is used to create samples. Different forms of
are utilized to simulate multiple deformation types, such as images with folded lines and curled curves; and to simulate different levels of deformation by
size changes. Interpolation and image inpainting methods are then applied to pad missing pixels of the simulated images.
The use of existing Stacked Unet-based methods can lead to cracks, distorted lines of texts, and badly deformed characters, among others issues. The Alibaba team has improved the network structure based on Dilated Convolution, and proposed a new unwarping method by adjusting the loss function and smoothing the prediction.
The metric MS-SSIM (Multi-Scale Structural Similarity) is used to evaluate the effect of unwarping algorithms. Alibaba’s proposed algorithm’s MS-SSIM is 0.693, which represents an obvious improvement compared to 0.490 for the previous state-of-the-art one.
The HMM model that directly combines word correction with segmentation is not efficient in terms of the large search space in the prediction stage due to long texts. Therefore, Alibaba addresses this problem from a new perspective: word segmentation is treated as a particular case of word correction, and blank is also considered a valid character; correction is treated as a translation task, where a wrong character sequence is translated into a correct one. This way, word correction and segmentation are abstracted into a sequence to sequence process.
Data synthesis (adding, deleting, and modifying characters based on the probabilistic transfer matrix) and transfer learning are used to train and obtain the target model. Currently, the error rate (edit distance) between the OCR recognition result and the Ground Truth is 15.91% (2.91% if blank is ignored). Applying the new correction and segmentation model decreases the error rate to 2.24% and increases word accuracy to 93.56%.
Wherever Smart Audit is applied, working efficiency improves by at least 50%, with cost and risk greatly reduced. This section describes two application examples in real scenarios.
In this example, the user uploads a photographed or scanned LC file. After a series of image processing and natural language processing procedures, Smart Audit reviews each term, marks risk information, and returns the review and decision report.
The user uploads a photographed or scanned document (such as an insurance policy, B/L, customs declaration). Smart Audit parses and verifies each field, marks information with different colors (purple for consistency, yellow for suspicious information, and red for missing information), and returns a verification and suggestion report.
This article summarizes the business background and technical solutions of Smart Audit, describing some of the implemented and innovative algorithms and models, as well as several applications. As a new mode of international trade, Smart Audit uses machine learning and artificial intelligence technology to provide risk and decision reports, as well as overall solutions. It is also pushing forward the implementation of other technologies (such as blockchain technology) to better serve more SMEs in international trade.