Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.
Headless Chrome was announced with quite a bit of fanfare. Its reveal caused maintainers to setup down, new packages to come out, and even new startups to launch (full disclosure: that last link is my own initiative). If you regularly visit Hacker News, then none of this should be a shock to you.
Though not everything goes perfectly with major changes like these. PhantomJS, the headless browser that recently became deprecated due to the aforementioned, did have its issues. It also had a lot of maturity and history behind its API, and the fact that it included its own library to use out-of-the-box made it trivial to get going. In contrast, there’s currently two predominant headless Chrome libraries, signaling an already fractured ecosystem (though hopefully not for long). Only one comes bundled with a version of Chrome guaranteed to work with it, though at the cost of flexibility. Still, all of the major open-source headless libraries are demanding the move to headless Chrome (see here and here).
Typically I tend to watch events such as these from a distance, but as fate would have it I’ve been swallowed by a of GitHub and Google.
At this point you’re probably wondering who the hell I am. I don’t work for Google nor graph.cool or really anyone with significant stake in the headless browser game. I do have something that others, including those listed, might not: a plethora of experience and frustration felt on the user side. I personally have fought many times with PhantomJS’s missing JavaScript API’s, flaky tests that must executed through it, and debugged my way to hell and back deep inside of it. I’ve been in the trenches for nearly half a decade and counting.
So, how did I end up in the middle of all this fanfare? Well, about 4 months ago I was writing a side project that desperately needed a web driver of some kind. I was, as one does, working on a scraper for webpages with prices, titles, and descriptions of products that users could submit via a URL input. Armed with the cheerio package; I was going to “disrupt” the business of gift registries. All was going swimmingly until I ran into Target.com. As you might not know, Target runs what’s famously known as a single-page application, meaning they serve nearly no content in their HTML, and rely on JavaScript to perform all the sites operations. This meant that no cheerio was big enough to save me from what was coming.
Target with JavaScript disabled :(
OK. Great. If I wanted to handle Target in my site, and I did, then I’d have to execute a JavaScript runtime in order to fully “load” their page. Being burned numerous times by PhantomJS and its sluggishness, I opted to use Chrome instead. At this point in time headless Chrome was just recently announced, and of course there were no fantastic libraries available for it like puppeteer or chromeless. So what’s an engineer to do? Well, it’s to change gears from working on your app and fix the problem, naturally! What eventually came of this was a package called Navalia. I was somewhat successful in this endeavor, even claiming #3 on Github’s trending for TypeScript for a few days.
My 15 minutes of fame
Look ma! #3!
This then caught the attention of the fine folks at graph.cool, who reached out to see if I’d care to help them with a project called chromeless since I was gaining some traction. After some internal moral dilemma I decided to join forces with graph.cool’s chromeless project and begin the process of deprecating Navalia (it’s ok, it lived a long life in JavaScript years). I strongly feel that we should group together to make one amazing project versus three or four mediocre ones. Of course, as JavaScript would have it, Google then came out with their puppeteer project. With an API almost identical to chromeless and navalia, we now had a new contender in the headless library arms’ race.
This gets us to where we are now: two libraries and a few ways to execute them in cloud infrastructure. Let’s take a closer look at both libraries and their distinguishing factors.
Chromeless, as some might not be familiar with, is not only a rich API for driving headless Chrome, but comes with a prescription for how to execute headless work in a production/CI environment. Their take is fascinating: instead of running and managing the binary in your own infrastructure, just do it in AWS lambda.
Diagram for how to execute chromeless locally and in AWS
Up until this point, there was no great solution for how to setup a headless browser in a hosted environment (read: linux). Chromeless was able to do all of this in part because of this packages efforts (more on that later).
In order to facilitate talking with Chrome chromeless utilizes the chrome-remote-interface. This project abstracts away the pain of crafting your own web-socket client, establishing the connection, and implementing all of the necessary protocols in order to pilot chrome successfully. It’s important to highlight that web sockets are the only interface into Chrome, and at some level your stack will have to open a web socket connection.
The big drawback to using chrome-remote-interface is that both it and Chrome are still shipping breaking changes somewhat frequently, therefore your Chrome binary must line-up exactly with the protocol you’re using. This is because the remote protocol doesn’t support mixed versions, as read in their FAQ here.
chromeless does, in my opinion, have a really elegant API.
const screenshot = await chromeless.goto('https://www.google.com').type('chromeless', 'input[name="q"]').press(13).wait('#resultStats').screenshot()
These fancy chaining operations are brought to you via Promise chaining, awa of waiting until a then
-able is called (which async/await
does for you) and then executing the collection of operations. This API makes it pair up quite nicely with lambdas as this collection of operations is actually sent to the lambda call and done at once versus doing each operation individually (making for a more chattier workflow). The subject of how to implement this batched operation is quite fascinating, and the source can be found here.
To summarize into bullet-points for all you skimmers out there:
Pros
Cons
As I mentioned earlier, Google released their library puppeteer shortly after chromeless. Though the API’s are fairly similar, there’s actually quite a bit of difference in implementation, which can change which library you chose based on your requirements.
Puppeteer’s API
const puppeteer = require('puppeteer');
(async () => {const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto('https://example.com');await page.screenshot({path: 'example.png'});
await browser.close();})();
Because of the lack of chained operations this makes puppeteer a harder package to run in serverless type of environments. Of course, their own playground gets around this by sending the script directly to the server for execution.
Puppeteer’s playground
Because of this API design it’s not as great of a candidate for AWS’ lambda approach, as the browser instance needs to “hang around” for a while prior to being collected. You could take inspiration from how Google does this, and upload your script to be executed by AWS, however there are still message limits you’d have to overcome.
Puppeteer takes a different approach from chromeless in that is doesn’t use the chrome-remote-interface package. Instead, it implements the protocol itself: starting the web socket client and marshaling messages and their responses. This might seem like re-inventing the wheel since the chrome-remote-interface takes care of all this. However, in contrast to chromeless, puppeteer comes with its own Chrome binary which is guaranteed to work with the library you’ve installed. This makes it a bullet-proof choice if you need a locked-down package where the API is guaranteed to work. The cost here can be non-trivial, though, as you’re dependent on the package maintainers keeping their protocol up-to-date as well as their API. Since this is backed by Google there’s a good chance that this will always be the case, though Google has been known to abandon projects for any reason.
For our skimming friend, heres’ the skinny on puppeteer:
Pros
Cons
I hope this has given you some guidance on library decisions, because things are about to get a lot more complicated when go to ship our code into a production or continuous-integration environment 😢
Before we even begin to talk about getting headless Chrome on AWS lambda, let’s first review their restrictions found here (I’ve filtered them down to constraints as they apply to Chrome):
To get Chrome running on these types of constraints you’ll first have to do the following, or rely on someone else to do it:
/dev/shm
to /tmp
inside the Chrome codebase 😨And there’s still a few more steps that I won’t waste time here elaborating on, especially since they’re well documented in the following places:
Now, this might seem incredibly convoluted and painful to setup, and to a degree it is. However, lambdas are incredibly cheap in the free tier, and can scale horizontally quite nicely. With that type of cost and scale you can execute an insane amount of functional tests in a matter of seconds.
In Graphcool’s case this decreased test durations from ~20min to a few seconds.
If scale and planning is something that you’re not certain of, then the AWS approach is a great one. You can just as easily run 1 invocation up to 1,000 without much fuss or change.
Of course there’s always two sides to the story when it comes to technology that is opaque. The first major pain point is the need to maintain the Chrome binary yourself as you’ll have to piece it together to run in lambdas. This might change in the future if more folks begin to use lambdas for non-standard things like headless Chrome. Since lambdas are also quite limited in their storage and time, issues like fonts and long-running workflows are non-starters. Even accounting for those drawbacks, lambdas can still be quite quirky, as seen here, here, and here.
Not forgetting our skimmers, here’s the bullets y’all are craving:
Pros:
Cons:
If AWS doesn’t fit your needs due to its drawbacks, then running Chrome on a docker container might just be up your alley. It’s relatively straightforward as there’s numerous Dockerfiles out there to get started. Even puppeteer has file to get you going. This generally frees you from all the limitations from lambda, as you can have scripts run at your leisure and consume as much as they want.
Example dockerfile install of Chrome
ebidel/try-puppeteer_try-puppeteer - Run Puppeteer code in the cloud_github.com
The nice thing about running your own docker container is that you’re free to use whatever hosting provider (provided they allow or use docker) and can scale to the load you need. Of course you lose out on all the other perks that lambdas provide, namely the auto-scale feature, which is a tricky thing to do in standard cloud providers as you’ll have to load-balance not only http request but web socket connections as well. The docker approach is also perilous as you’ll still run into missing fonts and other drawbacks.
I want emojis 🔥
Never realized there were so many square box emojis…
The other major caveat to the docker approach is ensuring scripts run in a clean isolation. Even though Chrome has a way of creating incognito profiles, few libraries (including puppeteer) support it. This roughly means that you’ll have to write some handlers around Chrome in this container to ensure new Targets get a clean context (read: man-in-the-middle all web socket messages). Yikes!
Pros:
Cons:
Now, before I get too much into browserless, I want to fully disclose that I’m the creator of it. As you might have noticed, I like to find issues that no one has thought about, and saw issues around headless Chrome when it came to service providers. End disclosure.
Looking back on my little gift registry app, neither Docker nor AWS lambdas satisfied my requirements, as I needed features they just didn’t have. Emoji’s were a must-have, clean isolation was a must as anyone could be using this, and I didn’t want to spend all my time maintaining Chrome in a cloud provider. This is what birthed browserless into the world.
Browserless really sits on top of the docker-way of doing things, but offers some other features as well. It watches Chrome and reboots it when it becomes sluggish, has good support for a variety of languages and emoji’s, and it works with just about any library out there:
🎉 Emojis!
Remember that puppeteer picture? This is what it should have looked like.
This isn’t to say that it doesn’t have its drawbacks. For one it likely costs a bit more than running a docker container yourself. If you’re in tight spot of “I have no idea how much scale I need” then it’s likely not for you as well. There’s still challenges debugging browser-jobs in remote locations, but those are generally shared amongst all providers.
Pros:
Cons:
Even though I somewhat stumbled into this part of web development, I’m extremely excited by the fruits of labor thus far and look forward to the road ahead. I think there’s still a great deal of knowledge that we’ll have to spread in order to keep the best-practices up to date and moving forward. To that end, I’m excited to announce that I’ll be putting together a website that captures best-practices, cool ideas and recipes for these new libraries, and all the updates in the headless arena. Keep your eye out for its reveal soon.
Finally, I welcome your thoughts, feedback, and comments on any of the above. Let me know if I’m gravely mistaken or if there’s a concern you have with headless browsers that haven’t been met. Until then, I’ll see you on the internet!