Web scraping collects and extracts unstructured data from a website to a more readable structured format like JSON, CSV, and more. Organizations set on scraped endpoints that are permitted. guiding principles When scraping a website for personal use, it can be stressful to manually change the code every time, as most big brand websites want people to refrain from scraping their public data. The following restrictions or problems might arise, such as CAPTCHAs, user agent (allowed and disallowed endpoints) blocking, IP blocking, and proxy network setup are set. A practical use case of web scraping is notifying users of price changes for an item on sites like Amazon, eBay, etc. In this article, you will learn how to use Bright Data’s to unlock websites at scale without being blocked because of its built-in unlocking capabilities. Scraping Browser Sandbox Test and run the complete code in this Codesandbox Codesandbox Prerequisites It would help if you had the following to complete this tutorial: Basic knowledge of JavaScript. Have installed on your local machine. It is required to install dependencies Node A code editor - VS Code What is Bright Data? Bright Data is a data collection or aggregation service with a massive network of internet protocols (IPs) and proxies to scrape information off a website, thereby having the resource to avoid detection by company bots that prevent data scraping. In essence, Bright Data does the heavy lifting in the background because of its large datasets available on the platform, which removes the worry of being blocked or gaining access to website data. https://www.youtube.com/watch?v=AGaiVApKfmc&embedable=true What is a headless browser? A headless browser is a browser that operates without a graphical user interface (GUI). Modern web browsers like Google, Safari, Brave, Mozilla, and so on; all have a graphical interface for interactivity and displaying visual content. For headless browsers, it functions in the background with scripts or in the command line interface (CLI) written by developers. Using a headless browser for web scraping is essential because it allows you to extract data from any public website by simulating user behavior. Headless browsers are suitable for the following: Automated testing Web scraping Benefits of Puppeteer Puppeteer is an example of a headless browser. The following are some of the benefits of using Puppeteer in web scraping: Crawl single-page application (SPA) Allows for automated testing of website code Clicking on pages elements Downloading data Generate screenshots and PDFs of pages Installation Create a new folder for this app, and run the command below to install a node server. npm init -y The command will initialize this project and create a package.json file containing all the dependencies and project information. The flag accepts all the defaults upon initialization of the app. -y With the initialization complete, let’s install the dependency with this command: nodemon npm install -D nodemon is a tool that will automatically restart the node application when the file changes. Nodemon In the , update the scripts object with this code: package.json package.json { ... "scripts": { "start": "node index.js", "start:dev": "nodemon index.js" }, ... } Next, create a file, , in the directory's root, which will be the entry point for writing the script. index.js The other package to install is the , the automation library without the browser used when connecting to a remote browser. puppeteer-core npm install puppeteer-core Building with Bright Data’s Scraping Browser Create an account on to access all its services. But for this project, the focus would be on the Scraping Browser functionality. Bright Data On your admin dashboard, click on the Proxies and Scraping Infra. Scroll to the bottom of the page and select the . After that, click the button from the proxy products listed. Scraping Browser Get started On opening the tool, give the proxy a name and click the button, and when prompted about creating a new zone, select . Add Proxy, Yes The next screen should be something like this, with the host, username, and password displayed. Now, click on the button and on the next screen, select Node.js as the language of choice for this app. </> Check out code and integration examples Creating environment variables Environment variables are secret keys and credentials that should not be shared, hosted, or pushed to GitHub to prevent unauthorized access. Before creating the file in the root of the directory, let’s install this command: .env npm install dotenv Copy-paste this code to the file, and replace the entire value in the quotation from your tab: .env Access parameters .env UNAME="<user-name>" HOST="<host>" Creating a web scraper using Puppeteer Back to the entry point file, index.js, copy-paste this code: index.js const puppeteer = require("puppeteer-core"); require("dotenv").config(); const auth = process.env.UNAME; const host = process.env.HOST; async function run() { let browser; try { browser = await puppeteer.connect({ browserWSEndpoint: `wss://${auth}@${host}`, }); const page = await browser.newPage(); page.setDefaultNavigationTimeout(2 * 60 * 1000); await page.goto("http://lumtest.com/myip.json"); const html = await page.content(); console.log(html); } catch (e) { console.error("run failed", e); } finally { await browser?.close(); } } if (require.main == module) run(); The code above does the following: Import the modules, the , and puppeteer-core dotenv Read the secret variables with the and variables host auth Define the asynchronous function run In the block, connect the endpoint with in the object using the key try puppeteer browserWSEndpoint The browser page launches programmatically to access the different pages like elements and fire up events Since this is an asynchronous method, the sets a navigation timeout for 2 minutes setDefaultNavigationTimeout Navigate to the page using the function, and afterward, get the URL's content with the method goto page.content() It is compulsory that after scraping the web, you must close it in the block finally If you want to expand this project, you can take screenshots of the web pages in or format. png pdf Check out to learn more. the documentation Conclusion Scraping the web with Bright Data infrastructure makes the process quicker for your use case without writing your scripts from scratch, as it is already taken care of for you. Try it today to of Bright Data over traditional web scraping tools, restricted by proxy networks and make it challenging to work with large datasets. explore the benefits Resources Scraping Browser documentation Scrape at scale with Bright Data Scraping Browser