paint-brush
Generating PDFs in Javascript for fun and profit!by@jason.harrop
39,447 reads
39,447 reads

Generating PDFs in Javascript for fun and profit!

by jason.harropApril 4th, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Up until recently, creating complex or elegant PDFs in Javascript has been challenging.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Generating PDFs in Javascript for fun and profit!
jason.harrop HackerNoon profile picture

Up until recently, creating complex or elegant PDFs in Javascript has been challenging.

Here I’m going to show you step-by-step the path of least resistance to beautiful PDFs. Spoiler: recently made possible by docx to PDF conversion in Javascript :-)

What follows is some of what I will cover in my upcoming talk at PDF Association conference in Seattle in June.

From 1000 feet, here are your three main alternatives:

  • The first is to create the PDF directly, using pdfkit, jsPDF, or the higher level pdfmake. Pdfkit is like iText in the Java world. Pdfmake, based on pdfkit, has its own format for representing rich text; it converts this to PDF.
  • The second is to create HTML, then convert that to PDF. These days probably using puppeteer.
  • The third is to create a docx, then convert that to PDF.

Put another way, you can either create the PDF directly, or use HTML or docx as an intermediate format.

Since its now easy to convert docx to PDF in Javascript, the docx approach is the path of least resistance — particularly for business documents (proposals, invoices, contracts etc).

For one thing, often the content will already be in Word document format, making your job easy.

More importantly, its worth thinking up front about ongoing maintenance (changes to content and formatting). Is that something that you as a developer want to be doing, or is it better to enable the business to do this themselves? If its a Word document, then business users can update the document without troubling you.

Creating a docx in Javascript has been easy for some years, but until recently, converting it to PDF from Javascript has the sticking point. Happily, this is now do-able — without invoking some SAAS API, using LibreOffice, or anything like that.

With docx.js you can programmatically build up your Word document (much like pdfkit and jsPDF allow you to build up a PDF). But this probably isn’t a great idea, because for the final PDF to come out looking right, any feature you care to use has to be supported in both the create-docx and docx-to-pdf steps. For example, merged cells in a table, or adding a watermark.

What we want is an easy way to create a docx, and then the confidence that our docx will be converted cleanly to PDF.

For this, a “templating” approach is the answer: basically, you create a docx template with your wanted layout - in Microsoft Word, LibreOffice, Google Docs, Native Documents or whatever - then use the template engine to replace “variables”.

Step 1: populate docx template

Here we’ll use docxtemplater, in node.js.

Say you want a PDF invoice. Since part of the point of using a Word template is that it is easy for business users to make it pretty, let’s start with one of the invoice templates designed by Microsoft and included in Word.

invoice-template.docx

You can see I’ve added some variables (represented with curly braces, as required by docxtemplater).

You can click the image to see the docx in our Word File Editor. Click invoice-template.docx to download/use it with the code which follows.

Being a Javascript library, docxtemplater ingests data in JSON format:

<a href="https://medium.com/media/b02df26c06104f47a0253b7fc8c28576/href">https://medium.com/media/b02df26c06104f47a0253b7fc8c28576/href</a>

Notice the Items array. The table row repeats for each of the Items. You can see docxtemplater’s markup for a repeat/loop at the start and end of that table row.

For demo purposes here we’ll provide that inline in our javascript:

<a href="https://medium.com/media/bc3bd251f4d8f1eb95acafefbf16babf/href">https://medium.com/media/bc3bd251f4d8f1eb95acafefbf16babf/href</a>

To try it, install docxtemplater as per its instructions:

npm install docxtemplater
npm install jszip@2

Then its just:

node invoice-template-docx.js

And you get a populated invoice instance:

invoice-instance.docx

Notice the table row has been repeated, and all variables replaced.

If you run the code yourself, you can verify the results by opening invoice-instance.docx in your favourite docx editor, or in ours: click here then drag/drop your docx.

Step 2: convert the docx to PDF

So far so good. Now we just need to convert the populated invoice instance to PDF.

For that, we’ll use docx-wasm, a node module we at Native Documents released earlier this year. Our bread and butter at Native Documents is the web-based document editing/viewing component we used above to display invoice-template.docx, and this node module generates PDF output using that Word compatible page layout code. Put another way, the page layout reproduces what Word does so closely that it can also be used for high quality PDF output.

First, install it:

npm install @nativedocuments/docx-wasm

Converting the docx in the node.js buffer object to PDF is then just:

<a href="https://medium.com/media/1d832a1ec32482812480793b99b62985/href">https://medium.com/media/1d832a1ec32482812480793b99b62985/href</a>

You’ll need a ND_DEV_ID, ND_DEV_SECRET pair to use this module. You can get free-tier keys at https://developers.nativedocuments.com/

Copy these into the docx.init call (or alternatively, you can set these as environment vars).

I haven’t posted the PDF here, since it just looks the same as the invoice-instance docx.

Putting it all together

Here is Javascript which combines step 1 and step 2.

<a href="https://medium.com/media/1dab9e63c1e1a3f7154f4b41dfc23ac6/href">https://medium.com/media/1dab9e63c1e1a3f7154f4b41dfc23ac6/href</a>

To try it, download invoice-template.docx then:

node docx-template-to-pdf.js

Deployment Options

A nice way to run this is on AWS Lambda. With Lambda, you get easy scalability, and you aren’t paying for servers when you aren’t using them. More on this in my upcoming talk at PDF Association conference in Seattle in June! In the meantime, docx-to-pdf-on-AWS-Lambda shows you how to do the docx to PDF part on Lambda. Adding the docx templating piece is straightforward.

Its also now possible to convert docx to PDF client-side, in-browser, reducing server loads, and opening the way to offline operation. docx-wasm-client-side shows you how to do the docx to PDF part client-side.