paint-brush
Mastering Full-Text Searches on PDF Documents with Foxit PDF SDKby@foxitsoftware
5,624 reads
5,624 reads

Mastering Full-Text Searches on PDF Documents with Foxit PDF SDK

by Foxit SoftwareOctober 5th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In this article, we will show you how to integrate Foxit PDF SDK with your system to perform accurate full-text search on a PDF document.
featured image - Mastering Full-Text Searches on PDF Documents with Foxit PDF SDK
Foxit Software HackerNoon profile picture

Occasionally, Ctrl-F isn’t quite sufficient. When dealing with an extensive array of documents, the task of locating particular phrases or identifying sources within metadata can become quite daunting. Fortunately, there’s a solution in the form of full-text search, a feature that scans an entire collection and furnishes comprehensive results.


When working with PDFs, the Foxit PDF SDK can serve as your go-to tool for executing full-text searches. In this article, we’ll guide you through the process of integrating Foxit into your system to conduct precise full-text searches on PDF documents.

Why Does Full-Text Search on PDFs Often Fail?

PDF files are designed to preserve the original formatting of a document, including layouts, fonts, and graphics. They can be viewed on any computer or device with PDF support while maintaining their original appearance. Therefore, for a full-text search to yield accurate results, the text must be extracted from the PDF files as a preliminary step.

What is Foxit?

Foxit is a versatile software suite offering a wide range of PDF solutions. It empowers users to perform tasks such as creating, editing, signing, merging, annotating, protecting, and scanning PDF files. In addition to these capabilities, Foxit incorporates user-friendly collaboration features that facilitate form-filling and information sharing with peers and collaborators. Notably, Foxit excels in rendering PDF files swiftly, even with large documents, all while consuming minimal system memory resources.


Furthermore, Foxit caters to developers by providing software development kits (SDKs) and plug-ins that can be seamlessly integrated into various applications. Foxit’s software is compatible with a variety of platforms, ensuring accessibility across different operating systems. To explore further details about Foxit and its offerings, you can visit their official website.

How Does Foxit SDK Let You Search Text in a PDF?

The challenge of locating text within a PDF lies in the way the PDF format organizes text and objects. The Foxit SDK addresses this challenge by meticulously noting the characteristics of these objects, including their location, size, and rotation angle for display purposes. This meticulous approach simplifies the process of searching for specific words or content within your document, as the SDK allows you to customize the search engine to accommodate common occurrences.

This functionality applies to all text contained within the PDF, regardless of the document’s encoding type or language. To expedite the search process, the software employs SQLite to analyze the document, resulting in rapid response times.


Prerequisites

To get started, first, you’ll need to have the following:


Building the Example App

The first thing you need to do is download the SDK in Zip format and extract it. Your folder structure should look like this.


After extracting the app, you’ll find a package.json file that contains all the packages used.

Install the packages with the following command:

`npm install`

After installing the packages, the next step is to start the local server:

`npm start`

To access the server, use the following address:



Setting up a New JavaScript Web App with Foxit

We will be using the Foxit PDF SDK to build a web app that has a full PDF viewer feature.

\Follow the instructions below to get started:

  • Create a new folder for the project
  • From the SDK you downloaded earlier, copy the lib, server, and external folders and the package.json file into the new folder you created. (Only copy the external folder if you want to use font resources0.
  • Add a PDF file to the new folder also (this is for test purposes).
  • Lastly, create an index.html file in the new folder.


Now, this is what your file structure should look like:


rnewFolder
+-- lib (copied from the Foxit_PDF_SDK)
+-- server (copied from the Foxit_PDF_SDK)
+-- package.json (copied from the Foxit_PDF_SDK)
+-- index.html (You created this file)
+-- youOwn.pdf (sample pdf you added to the folder)
+-- external (optional file from Foxit_PDF_SDK for font resources)


Now, open your code editor and add the following code snippet in your index.html.


<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Foxit Web SDK Practice</title>
<style>
.fv__ui-tab-nav li span {
color: #636363;
} .f
lex-row {
display: flex;
flex-direction: row;
}
</style>
</head>
<body>
</body><br></html>


Let’s import styles from the lib file we copied. We’ll be adding it to the <head> tag.


<link rel="stylesheet" href="./lib/PDFViewCtrl.css">


Also, import the script library from the lib folder.


<script src="./lib/PDFViewCtrl.full.js" charset="utf-8"></script>


Add <div> element between the <body> tag; this will be the web viewer container.


<div id="pdf-viewer"></div>


Initialize the PDF viewer before the closing body tag.


<script>
const licenseSN = "Your license SN";
const licenseKey = "Your license Key";
const PDFViewer = PDFViewCtrl.PDFViewer;
const pdfViewer = new PDFViewer({
libPath: './lib', // the library path of Web SDK.
jr: {
licenseSN: licenseSN,
licenseKey: licenseKey,
}
});
pdfViewer.init('#pdf-viewer'); // the div (id="pdf-viewer")
</script>


You can get the trial license key and license SN from the license-key.js file in the examples folder from the SDK folder. Get the PDF document.


fetch('./JavaScript-for-Kids.pdf').then( (res) => {
    //modify the path to get your pdf
    res.arrayBuffer().then( (buffer) => {
    pdfViewer.openPDFByFile(buffer);
  })
})


These are the key settings we need to set up Foxit. The complete HTML file should look like this:


<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Foxit Web SDK Practice</title>
<link rel="stylesheet" href="./lib/PDFViewCtrl.css">
<script src="./lib/PDFViewCtrl.full.js" charset="utf-8"></script>
<style>
.fv__ui-tab-nav li span {
color: #636363;
}.f
lex-row {
display: flex;
flex-direction: row;
}
</style>
</head>
<body>
<div id="pdf-viewer"></div>
<script>
const licenseSN = "Your license SN";
const licenseKey = "Your license Key";
const PDFViewer = PDFViewCtrl.PDFViewer;
const pdfViewer = new PDFViewer({
libPath: './lib', // the library path of Web SDK.
jr: {licenseSN: licenseSN,
licenseKey: licenseKey,
}
});
pdfViewer.init('#pdf-viewer'); // the div (id="pdf-viewer")
fetch('./JavaScript-for-Kids.pdf').then((res) => {
// modify the path to get your pdf
res.arrayBuffer().then(function (buffer) {
pdfViewer.openPDFByFile(buffer);
})
})
</script>
</body>
</html>


Integrating the Complete Web Viewer Package

We just finished setting up the Basic package. Let’s move to the complete web-view package.


First, import the styles:


<link rel="stylesheet" href="./lib/UIExtension.css"> 

Next, import the script:


<script src="./lib/UIExtension.full.js" charset="utf-8"></script>


In the body tag, add a div tag:


<div id="pdf-ui"></div>


Initialize the Complete package extension: \

const pdfui = new UIExtension.PDFUI({
  viewerOptions: {libPath: './lib', // the library path of web sdk.
  jr: {
    licenseSN: licenseSN,
    licenseKey: licenseKey
    }
  },
  renderTo: '#pdf-ui' // the div (id="pdf-ui").
});


Finally, add the code to launch the PDF file:


fetch('./JavaScript-for-Kids.pdf').then((res) => {
    // modify the path to get your pdf
    res.arrayBuffer().then((buffer) => {
    pdfui.openPDFByFile(buffer);
  })
})


How to Allow Users to Search a PDF with the Web SDK

If the complete web-view SDK is integrated into your app, users can easily navigate the sidebar and find the search bar icon.



However, you can also implement custom controls for your web app; the SDK provides you with a lot of customization that you can utilize to suit your project.

Conclusion

The PDF format stands as a fundamental document format, particularly for sharing and collaborative purposes. In this post, we’ve walked through the straightforward integration and execution of a full-text search on any PDF file, courtesy of the Foxit SDK. With this tool at your disposal, you should be well-equipped to conduct both rapid and advanced searches within your PDF libraries.


For further insights and details, I encourage you to refer to the Foxit documentation. It can provide comprehensive guidance and additional resources to help you make the most of this powerful tool.


Also published here.