Running a screenshot service on a $5-per-month VPS
My hobby project — npmcharts, is a single-page app that shows the download trends of various npm packages. If you went looking into which headless chrome library to use, you’ll see this graph -
However, when that page is shared to Facebook, Twitter or Slack, the preview image that shows up would all be the same — the screenshot I took comparing frontend frameworks and uploaded as the site’s sole Open Graph image two years ago. Tsk tsk.
Everything was hosted on my little $5 Digital Ocean droplet. One of the reasons I’d put this feature off for so long was because I was worried the droplet wouldn’t be able to handle the load of running a snapshotting service. I’ll have to try to be efficient.
Starting out simple
Let’s start out simple and first get something that works:
Fairly straightforward, but launching a browser instance for each request and closing it afterward seems wasteful.
The average time to return a screenshots is currently ~3.5 seconds on my MacBook, it’ll only take longer on the D.O. droplet. Let’s reduce and reuse.
Note: All timings given are measured against local servers running on my MacBook. It only accounts for execution speed of the function to return an image on a 2015 MBP. Network transfer speeds are not included!
Pooling would allow us to keep a handful of browser instances open and reuse them for each screenshot request. Puppeteer doesn’t come with built-in pooling solutions, but there are a few generic libraries available. We’ll be using
The first pool:
But wait, we shouldn’t need to create pages and set viewport for each screenshot either. Let’s pool pages instead of browsers:
Let’s update our
getChartImage function to ask for pages from the pool:
The average time of subsequent screenshots is now down to ~1.58 seconds! (For those curious, when just pooling browsers without pages, the average time was ~1.87 seconds)
Update: Michael J. Ryan from echojs keenly pointed out that you can go further and have pool the pages against a single browser. “Start time of browser instances will be reduced. Chrome creates a separate management/runtime process for each tab/page!” Here’s a link to an even more efficient gist to getBrowserPool
Take advantage of “single-paged-ness”
One of the advantages of SPAs is that browsers don’t have to reload all resources and re-parse all the scripts upon navigation. However, by calling
page.goto each time, we were unnecessarily triggering full page reloads when navigating within the same app.
The solution for this varies depending on the framework and routing library the app uses, but the basic idea is fairly simple and translatable —
- On the frontend, expose the routing function that would allow route navigation to be triggered by puppeteer from the global context (i.e.
- Also on the frontend, make a flag available to let puppeteer know when the route transition is complete.
- Puppeteer would flip the flag to
false, call the routing function, and poll the flag’s value until the front-end flips it to
true(signifying that the route change is complete).
If your app uses React and React-Router v4, you could use
withRouter somewhere in the app to ask for the
history object, then stick that into the window. e.g.
In my case with Vue 1.0 and vue-router 0.7, I added this line to the root component’s
After the route transition has completed (data loading and rendering is done), the frontend would flip the flag to signal back that it’s ready to have its screenshot taken:
Let’s update getChartImage to use those hooks
A screenshot now only takes ~860ms! We’ve managed to shave the time down to less than a quarter of the initial implementation.
There’s one more thing we can do —
We’d save more resources if we didn’t have to generate these images every time they’re accessed.
When a request for a screenshot comes in, we want to —
- Check if a screenshot for that resource already exists.
- If it does, check if it’s stale and needs to be updated.
- If it exists and is not stale, directly return that file.
- If it doesn’t exist or needs to be updated, create a new snapshot.
And that’s it! Subsequent requests within a certain time period only takes 0.3ms. The next step would be to save and serve it up from a CDN instead of the local filesystem, but I think this is good enough for now :) Digital Ocean’s droplet comes with 25 gigs of SSD and 1TB transfer, will save that for when I need it.
And please come checkout my site npmcharts.com for all your npm package comparison needs! Here’s one of webpack, browserify, rollup, and parcel: —
📬Subscribe to my newsletter to receive upcoming articles in your inbox