Every owner of a laptop or phone knows what a web browser is and uses it every day. Even this article was written and read in a web browser. However, not everybody knows that browsers can be run in a headless mode and be used for resolving various tasks besides simple surfing on the internet. In this article, you will learn what headless browsers are, what are the use cases and how to implement serverless services and run Chrome headless in AWS Lambda. What are Headless Browsers? The general definition of a headless browser is pretty easy — it is a browser without a user interface. Such a browser persists all the normal functionality, but since there is no need to render any content to the real screen, such a browser consumes less memory, doesn’t require GPU, is more performant, and can be controlled programmatically. These features allow developers to effectively utilize browser possibilities for implementing web crawlers, running UI tests, taking screenshots, tracking web page performance, automating website interactions, and other things. Further in the article, we will focus on headless Chrome from AWS Lambda and explore how to implement and utilize both. How to Try Running a Headless Chrome? Before learning how to run Headless Chrome on AWS Lambda, let’s go deeper into the functionalities of Headless Chrome. If you have Google Chrome installed on your computer, you can try to run it in a headless mode right now. For example, it is possible to convert any web page into a PDF file with a single command. Open the command line, navigate to a folder you’d like to save the PDF file to, and execute the following command: $ chrome --headless --disable-gpu --print-to-pdf https: //techmagic.co The chrome browser will start without any user interface, it will silently load TechMagic’s website, render it in the memory instead of the real screen and create the output.pdf file on the disk in a few seconds. If it doesn’t happen and you see an error in the console saying that such a command is not found, you need to tell your system that command “chrome” stands for executing Chrome browser which is located in its installation folder. If you are a Windows user, you may try following the in order to resolve this issue. path setting instructions If you are using Mac, this can be easily done with the next command: alias chrome= "/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome" Now, try to rerun the initial command. If for some reason it still doesn’t work, don’t worry, as after reading the below sections you will be able to run a much more cool example on AWS! Complex Scenarios with Headless Chrome Having the ability to create a PDF or an image of a website with a single command is cool, but what about more complex scenarios? Is it possible to give some instructions the browser should perform on the page? Definitely yes! Chrome DevTools Protocol Chrome DevTools Protocol or is a special way of communicating with the running browser instance by sending special commands via a network connection to the particular port opened by that instance. CDP CDP allows debugging and fully controlling the browser programmatically, meaning that it is possible to write a script that will perform various tasks in the browser. For example, you may develop an application that will run a browser, navigate to a particular page, click buttons, fill in forms, dump contents, take pictures, and many other things. Of course, it is possible to write such applications from scratch, including managing the network connection, messaging, and strictly following the protocol on your own. However, the usage of special libraries will save your time and prevent mistakes in the implementation. Puppeteer One of the great libraries exposing a great high-level API for communicating with Chrome or Chromium is . It was originally written in Node.js and then also ported to Python and .Net, but there are some for other programming languages. Puppeteer alternatives Puppeteer allows you to write elegant and readable code and concentrate on the desired functionality and scenarios rather than proper network communication. It is also a serverless web scraper for Chrome headless that provides automation. In the practice part of this article, you will explore running headless Chrome via Puppeteer and coding an app with Nodej.js Test Project: Screenshot Service The best way of understanding things is by trying them yourself. The idea of a test project is in implementing a web-based app able to respond with a picture of how a particular website looks on a particular screen size. The application expects 2 parameters to be sent as a part of the user's request: the URL of the website to be captured and the screen resolution. For example, request to: https://example.com/capture?url=https://techmagic.co&screen=1280,800 This will result in a screenshot of how the TechMagic website looks in 1280 x 800 screen resolution. Hosting such apps using a serverless approach on the AWS Lambda would be the ideal option from both performance and cost perspectives. Why AWS Lambda? is a serverless computing service provided by Amazon Web Services, it is incredibly cost-effective and scalable. The main concept of AWS Lambda functions is running code in response to various events like HTTP requests, changes in file storage, messages from other AWS services, emails, and other things happening in the application. AWS Lambda In turn, you are billed only for the time taken to execute your function and you never pay for the idle. Code runs in the virtual stateless containers where one container is processing only one event at a time and if there are 20 simultaneous incoming events AWS will immediately create 20 containers to handle the spike and will close those down afterload decreases. Service is fully managed by Amazon and developers don’t need to worry about infrastructure and may concentrate on the code and application logic instead. The benefits and limitations of AWS Lambda are not the topics of this article, but the few above facts already make AWS Lambda an ideal option for our test project! Implementing a Service Before diving into the code, you will need to have Node.js on your machine. You will also need to install — a superior command-line tool for deploying and managing applications based on AWS Lambda. Finally, you should have an AWS account and you should Serverless Framework with your AWS credentials — it will be not possible to deploy applications without doing this. The whole preparation process may take a while, but the final result will definitely be worth it. installed Serverless Framework configure Create Project and Install Dependencies Open terminal, navigate to a directory where you’d like to keep project files, and run the following command: $ serverless create --template aws-nodejs --path screenshot-service It will create a folder “screenshot-service” with some initials files inside. Then, navigate into this folder and run commands for initializing Node.js project and installing puppeteer. $ npm init Simply confirm all the prompts and then run the package: $ npm install puppeteer-core Note that you actually install the “puppeteer-core” module instead of “puppeteer”. The reason for that is because you do not need a browser itself that is included in a “puppeteer” module, instead, you want to have only the communication functionality. You may probably ask “How then the browser will get into Lambda?” — it is a good question and you will find the answer in the below section. Serverless Configuration Open the file, which is the main configuration file for your application and may consist of dozens of describing future service and the resources required. Here are all the settings required for the test project, so you can simply replace the content of your file these: serverless.yml properties serverless.yml service: screenshot-service
frameworkVersion: provider:
  name: aws
  runtime: nodejs12.x
  region: eu-west functions:
  capture:
    handler: handler.capture # refers to '2' -1 . :
      — : # : : : 1536 # : 15 : # — : : : - -1:764866452798: : - - :20 function capture in handler js events http trigger function via http request path capture method get memorySize RAM size for the function timeout layers reference to the already existing layer with Chrome arn aws lambda eu west layer chrome aws lambda Most of the properties are quite self-descriptive, but the last line is worth paying a bit more attention to because it is actually the answer to the question about how the browser gets into the Lambda environment. That is done via AWS Lambda Layers — a feature for extending Lambda environments with any necessary content such as libraries, custom runtimes, binaries (like headless Chrome), and other dependencies. It is possible either to create and publish own layers or use publicly available layers prepared by third-party organizations, open-source enthusiasts, and communities. Preparing a layer with a custom binary or binary itself might be a process as it requires execution files to be firstly compiled in an environment similar to the AWS Lambda system. So for simplicity in this test project, we will refer to the and deployed layer by pasting its arn (unique resource identifier in AWS) into the “layers” property in file. However, it worth noting that in a real project you also may use third-party layers, but you should deploy them to your AWS account in order to prevent potential failures if the author removes the published layer. tricky existing serverless.yml Coding the Function Now it is time to add some javascript code responsible for processing requests. Open the file and replace its content with the following: handler.js puppeteer = ( ); chrome = ( ); capture = (event) => { { queryStringParameters } = event; (!queryStringParameters || !queryStringParameters.url || !queryStringParameters.screen) { { : };
  } { url } = queryStringParameters; [width, height] = queryStringParameters.screen.split( ); (!width || !height) { { : };
  } browser = puppeteer.launch({ : chrome.executablePath, : chrome.args
  }); page = browser.newPage(); page.setViewport({ : (width), : (height)
  }); page.goto(url); screenshot = page.screenshot({ : }); { : , : , : { : }
  };
} .exports = { capture }; const require "puppeteer-core" const require "chrome-aws-lambda" const async const if return statusCode 403 const const "," if return statusCode 403 const await executablePath await args const await await width Number height Number await const await encoding "base64" return statusCode 200 body `<img src="data:image/png;base64, ">` ${screenshot} headers "Content-Type" "text/html" module The code above is quite self-descriptive and understandable, so even if you are not a JavaScript expert, you should see the main flow of instructions: check that URL and screen size were defined correctly, then open browser, open a new page, set the desired viewport size, navigate to the destination, capture screenshot and send it. Deployment This is probably the easiest part. The only thing you need to do is running the following command: $ serverless deploy If AWS credentials are configured correctly, the deployment process starts and may take a minute or two. In the end, you will see the result, including an URL of your service ready to use! Testing the Final Solution Copy the URL of your function endpoint, paste it into your browser’s address bar, but don’t submit for now as you first need to add the proper query parameters at the end of the URL, for example: ?url=https://techmagic.co&screen=800,600 Hit “Enter” and wait for a few seconds and see the resulting image! You may change the desired URL and screen size and re-run the query, but remember to use a valid URL with a valid protocol like “http://” or “https://” prepended. What’s Next? The possibility of running headless Chrome on AWS Lambda reveals a wide spectrum of various useful solutions to build and benefit from them for developers, testers, and end-users. In this article, you learned the core principles of executing headless browsers in serverless environments, but in order to effectively develop a real-world app, you need to continue investigating CDP and learning libraries like Puppeteer. You will also need to have a grasp of serverless concepts, AWS Lambda , and the Lambda layers management to be able to improve your application or resolve emerging issues. Keep in mind that practicing is the best way to consolidate your knowledge. quotas Written by Artem Arkhipov Web Expert at Techmagic, full-stack developer, coach and speaker. Artem is passionate about JavaScript, Cloud Computing and Serverless.

Flow

Amazon

Google

How To Implement Serverless Services and Run Chrome Headless in AWS Lambda

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AWS Lambda: Top 14 Use Cases for Your Business

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 2)

AWS Lambda: Top 14 Use Cases for Your Business

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 2)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps