paint-brush
DocRaptor Vs. WeasyPrint: Python PDF Generation Tools Showdownby@thawkin3
2,301 reads
2,301 reads

DocRaptor Vs. WeasyPrint: Python PDF Generation Tools Showdown

by Tyler HawkinsJanuary 26th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

A PDF export showdown. Who will win: DocRaptor or WeasyPrint?

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - DocRaptor Vs. WeasyPrint: Python PDF Generation Tools Showdown
Tyler Hawkins HackerNoon profile picture

I recently published an article comparing HTML-to-PDF export libraries. In it, I explored options like the native browser print functionality, open-source libraries jsPDF and pdfmake, and the paid service DocRaptor. Here's a quick recap of my findings:

If you want the simplest solution and don’t need a professional-looking document, the native browser print functionality should be just fine. If you need more control over the PDF output, then you’ll want to use a library.
jsPDF shines when it comes to single-page content generated based on HTML shown in the UI. pdfmake works best when generating PDF content from data rather than from HTML. DocRaptor is the most powerful of them all with its simple API and its beautiful PDF output. But again, unlike the others, it is a paid service. However, if your business depends on elegant, professional document generation, DocRaptor is well worth the cost.

In the comment section for my article on Dev.to, one person suggested I take a look at Paged.js and WeasyPrint as additional alternatives to consider. (This person is Andreas Zettl by the way, and he has an awesome demo site full of Print CSS examples.)

So today we'll explore the relative strengths and weaknesses of DocRaptor and WeasyPrint.

WeasyPrint Overview

Let's start with WeasyPrint, an open-source library developed by Kozea and supported by Court Bouillon. For starters, it's free, which is a plus. It's licensed under the BSD 3-Clause License, a relatively permissive and straightforward license. WeasyPrint allows you to generate content as either a PDF or a PNG, which should adequately cover most use cases. It's built for Python 3.6+, which is great if you're a Python developer. If Python is not your forte or not part of your company's tech stack, then this may be a non-starter for you.

One of the biggest caveats to be aware of is that WeasyPrint does not support JavaScript-generated content! So when using this library, you'll need to be exporting content that is generated server-side. If you are relying on dynamically generated content or charts and tables powered by JavaScript, this library is not for you.

Installing WeasyPrint

Getting up and running with WeasyPrint is fairly easy. They provide installation instructions on their website, but I use

pyenv
to install and manage Python rather than Homebrew, so my installation steps looked more like this:

Installing

pyenv
and Python:

# install pyenv using Homebrew
brew install pyenv

# install Python 3.7.3 using pyenv
pyenv install 3.7.3

# specify that I'd like to use version 3.7.3 when I use Python
pyenv global 3.7.3

# quick sanity check
pyenv version

# add `pyenv init` to my shell to enable shims and autocompletion
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.zshrc

Installing WeasyPrint and running it against the WeasyPrint website:

pip install WeasyPrint

weasyprint https://weasyprint.org/ weasyprint.pdf

As you can see, the simplest way to use WeasyPrint from your terminal is to run the

weasyprint
command with two arguments: the URL input and the filename output. This creates a file called
weasyprint.pdf
in the directory from which you run the command. Here's a screenshot of the PDF output when viewed in the Preview app on a Mac:

Sample PDF output from WeasyPrint

Looks great! WeasyPrint also has a full page of examples you can check out which showcases reports, invoices, and even event tickets complete with a barcode.

DocRaptor Overview

Now let's consider DocRaptor. DocRaptor is closed-source and is available through a paid license subscription (although you can generate test documents for free). It uses the PrinceXML HTML-to-PDF engine and is the only API powered by this technology.

Unlike WeasyPrint's Python-only usage, DocRaptor has SDKs for PHP, Python, Node, Ruby, Java, .NET, and JavaScript/jQuery. It can also be used directly via an HTTP request, so you can generate a PDF right from your terminal using cURL. This is great news if you're someone like me who doesn't have Python in their arsenal.

DocRaptor can export content as a PDF, XLS, or XLSX document. This can come in handy if your content is meant to be a table compatible with Excel. For the time being though, we'll just look at PDFs since that's something both WeasyPrint and DocRaptor support.

One relative strength of DocRaptor compared to WeasyPrint is that it can wait for JavaScript on the page to be executed, so it's perfect for use with dynamically generated content and charting libraries.

Getting Started with DocRaptor

DocRaptor has guides for each of their SDKs that are well worth reading when first trying out their service. Since we ran the WeasyPrint example from the command line, let's also run DocRaptor in our terminal by using cURL to make an HTTP request. DocRaptor is API-based, so there's no need to download or install anything.

Here's their example you can try:

curl http://[email protected]/docs \
  --fail --silent --show-error \
  --header "Content-Type:application/json" \
  --data '{"test": true,
           "document_url": "http://docraptor.com/examples/invoice.html",
           "type": "pdf" }' > docraptor.pdf

And here's the output after running that code snippet in your terminal:

Sample PDF output from DocRaptor

Voila: a nice and simple invoice. DocRaptor's example here isn't as complex as WeasyPrint's was, so let's try generating a PDF from one of DocRaptor's more advanced examples.

curl http://[email protected]/docs \
  --fail --silent --show-error \
  --header "Content-Type:application/json" \
  --data '{"test": true,
           "document_url": "https://docraptor.com/samples/cookbook.html",
           "type": "pdf" }' > docraptor_cookbook.pdf

Here's the output for this cookbook recipe PDF:

Sample PDF output from DocRaptor using their Cookbook Recipe example

Pretty neat! Just like WeasyPrint, DocRaptor can handle complex designs and full-bleed layouts that extend to the very edge of the page. One important callout here is that DocRaptor supports footnotes, as seen in this example. WeasyPrint, on the other hand, has not yet fully implemented the CSS paged media specifications, so it can't handle footnote generation.

You can view more DocRaptor examples on their site including a financial statement, a brochure, an invoice, and an e-book.

JavaScript Execution

So far we've seen the powers and similarities of both DocRaptor and WeasyPrint. But one core difference we touched on above is that WeasyPrint does not wait for JavaScript to execute before generating the PDF. This is crucial for applications built with a framework like React. By default, React apps contain only a root container

div
in the HTML, and then JavaScript runs to inject the React components onto the page.

So if you try to generate a PDF from the command line for an app built with React, you won't get the actual app content! Instead, you'll likely see the content of the

noscript
tag, which typically contains a message stating something like "You need to enable JavaScript to run this app."

This is also the case for applications that rely on charting libraries like Google Charts, HighCharts, or Chart.js. Without the JavaScript running, no chart is created.

As an example, consider this simple web page I've put together. It contains a page header, a paragraph included in the HTML source code, and a paragraph inserted into the DOM by JavaScript. You can find the code on GitHub. Here's what the page looks like:

DocRaptor JS demo web page

Now, let's use WeasyPrint to generate a PDF from the web page by running the following command in the terminal:

weasyprint http://tylerhawkins.info/docraptor-js-demo/ weasyprint_js_demo.pdf

Here's the output:

JS demo PDF output from WeasyPrint

Oh no! Where's the second paragraph? It's not there, because the JavaScript was never executed.

Now let's try again, but this time with DocRaptor. In order to have JavaScript run on the page, we must provide DocRaptor with the

"javascript": true
argument in our options object. Here's the code:

curl http://[email protected]/docs \
  --fail --silent --show-error \
  --header "Content-Type:application/json" \
  --data '{"test": true,
           "javascript": true,
           "document_url": "http://tylerhawkins.info/docraptor-js-demo/",
           "type": "pdf" }' > docraptor_js_demo.pdf

And the output:

JS demo PDF output from DocRaptor

Tada! The JavaScript has been successfully executed, leading to the insertion of the second paragraph.

Conclusion

So, which should you use, WeasyPrint or DocRaptor? It depends on your use case. 

If your app contains static content that doesn't rely on JavaScript, if Python is part of your tech stack, or if you need PNG image output, then WeasyPrint is an excellent choice. It's open source, free, and flexible enough to handle visually complex output.

If you need to use a programming language other than Python, or you rely on the execution of JavaScript to render the content you need exported, DocRaptor is the right choice.

Table of Comparisons

As an added bonus, here's a comparison table for a quick summary of these two libraries:

DocRaptor vs. WeasyPrint comparison table

Happy coding!

Also published at https://dzone.com/articles/docraptor-vs-weasyprint-a-pdf-export-showdown