Too Long; Didn't Read
Amazon Textract is a service that automatically extracts text and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms and tables. In this post, I show how we can use AWS Textract to extract text from scanned pdf files. The following code example shows how to use a few lines of code to send a. pdf to a.pdf file to an S3 bucket. Another Lambda function will be triggered to get a. getDocumentAnalysisonce response. We then iterate over the blocks in. JSON and save the detected text to S3.