Scraping the web for extracted data in an automated way with tools (Puppeteer, Playwright) that aid productivity is what data scientists, software developers, and research analysts use to gather information as competitive analysis, compare prices on e-commerce websites, and build apps that send email notifications to monitor change in prices like in the travel sector. Using Bright Data and GPT (Generative Pre-trained Transformers) to gather valuable insights about products, whether yours or any other competitor, is vital to gain actionable insights that will improve customers’ needs and boost sales as a result of the feedback; both negative and positive for analysis. As an example, we will demonstrate how suggestions from GPT can be helpful from reviews posted by users on the Udemy learning platform. Scraping Browser Leveraging this technique serves more than just individuals; brands or companies can use it to understand what people say about their products. Everything that you will learn in this article is for ethical purposes. And that is why is used to turn websites into structured data that is meaningful to any user without getting blocked or rate limited or using APIs (application programming interface). Bright Data Let’s get started! GitHub Find the source code in . Fork and clone it to test it yourself. this repo Note that it contains the frontend application in React in a folder called , displaying the reviews and suggestions data from Udemy and GPT, respectively, and a Node server, that saves the scraped data in a JSON (JavaScript Object Notation) file. reviews headless-web-scraping Demo For a practical demonstration of the client-side app, check it out . here Prerequisites Before building or writing a line of code, check the following requirements: as this would come installed with the package manager, npm Node.js >=16 Knowledge of JavaScript and React A code editor like or any other (IDE) integrated development environment VS Code Basic understanding of CSS Set up Bright Data Scraping Browser The is compatible with Puppeteer and Playwright, which comes with an in-built website unblocking actions. Scraping Browser To begin, on the Bright Data website (free), and it comes with a $20/GB “no commitment” plan. sign up Some of the great benefits of using Bright Data architecture are: Quick Flexible Cost-efficient Discover how to to your advantage. leverage web scraping After signup, go to your dashboard and click the icon on the window's left pane. Proxies and Scraping Infrastructure Next, click on the button dropdown and select . Give the proxy a name under the field and click the button to continue. Add Scraping Browser Solution name Add The next screen will display values for the , , and used to navigate the host username password Scraping browser. Let’s get the project running by installing the boilerplate. Installation Generally, in this section, you will learn the basics of initializing and creating a new boilerplate using Node.js and Vite. The web scraper in Node.js will handle the scripts for retrieving and storing the web data, while the UI (user interface) in React will display the info from the server and GPT. In this project, create a folder that will hold both the frontend and backend code like this: .
    └── Bright_data
        ├── headless-web-scraping
        └── reviews To set up a Node project, first, create a new directory with the command in the terminal: Node.js mkdir headless-web-scraping Next, change its directory: cd headless-web-scraping Initialize the project: npm init -y The flag accepts all the defaults without the interactive prompt, which are questions for the project in the file. -y package.json The will contain all the dependencies by installing the following: package.json npm install dotenv puppeteer-core : This library is responsible for loading environment variables from the file into the dotenv .env process.env : It is an automation library without the browser itself puppeteer-core Now, create the file in the root directory and copy-paste this code: index.js index.js console.log("Hello world!") Before running this script, head to the file and update the script section as follows: package.json {
  "name": "headless-web-scraping",
  ...
  "scripts": {
    "start": "node index.js"
  },
  ...
} Run the script: npm run start This should return: Hello world! The UI folder for this app is called . Run this command within the directory to scaffold a new Vite React project. React reviews reviews npm create vite@latest ./ The signifies that all the files and folders should be within the folder. Also, running the command will prompt a response in the terminal. Choose the and options, but you can use any other framework you are comfortable using. ./ React JavaScript With the setup complete, ensure to follow the instructions in the terminal to install the dependencies and start the development server with the command: npm install

npm run dev Open your browser to see the UI and the server running on port . 5173 It is time to include , a CSS utility-first framework packed with classes on the JSX used for building modern websites. Tailwind CSS Check out and follow the instructions on installing Tailwind CSS in a project. this guide Vite Creating a JavaScript Web Scraper in Node.js Return to the tab on your created zone and copy the host and username values. Access parameters Environment variables are essential in Node.js for storing sensitive data like secret keys and credentials from unauthorized access in development. Creating Environment Variables Copy and paste these values into the file created in the root folder: .env .env AUTH="<AUTH>"
HOST="<HOST>" To load these credentials, update the with the following: index.js index.js const puppeteer = require("puppeteer-core");
require("dotenv").config();
const fs = require("fs");

const auth = process.env.AUTH;
const host = process.env.HOST;

async function run() {
  let browser;

  try {
    browser = await puppeteer.connect({
      browserWSEndpoint: `wss://${auth}@${host}`,
    });
    const page = await browser.newPage();
    page.setDefaultNavigationTimeout(2 * 60 * 1000);
    await page.goto(
      "https://www.udemy.com/course/nodejs-express-mongodb-bootcamp/"
    );
    const reviews = await page.evaluate(() =>
      Array.from(
        document.querySelectorAll(
          ".reviews--reviews-desktop--3cOLE .review--review-container--knyTv"
        ),
        (e) => ({
          reviewerName: e.querySelector(".ud-heading-md").innerText,
          reviewerText: e.querySelector(".ud-text-md span").innerText,
          id: Math.floor(Math.random() * 100),
        })
      )
    );

   const outputFilename = "reviews.json"

   fs.writeFile(outputFilename, JSON.stringify(reviews, null, 2), (err) => {
     if (err) throw err;
     console.log("file saved");
   });
  } catch (e) {
    console.error("run failed", e);
  } finally {
    await browser?.close();
  }
}

if (require.main == module) run(); Some things to note in the code above: The imported module, , , and the puppeteer-core dotenv file system Within the function is the method is responsible for connecting to a remote browser using a proxy server (Bright Data Scraping Browser) run() puppeteer.connect() The property is the WebSocket connection where the remote browser is running. The value passed as template literals are the parameters from the Bright Data web UI dashboard stored in the , which represent the and browserWSEndpoint .env username password The other details from the code block above are standard Puppeteer code: Launch a new page Set the default navigation time to 2 minutes Go to the course page on Udemy Inspect the HTML page using the method, which will loop through the elements in the DOM to get the and the page.evaluate() reviewer name review text Use the method to generate a random Math.floor() id Save the output of the result using the module in a JSON format fs Run the script: npm run start The output is saved within the folder as and should look like this: headless-web-scraping reviews.json [
  {
    "reviewerName": "Yash U.",
    "reviewerText": "This was a very intensive course covering almost all backend stuff. A huge thanks to the instructor - Jonas and also to the community. A lot of bugs and problems were already posted in the Q&A section and it helped a lot. Towards the end of the course, there were a few things that were outdated and a lot of people were disappointed in the comments but for me these things helped a lot. You learn to search and find solutions on your own and this is what is required in real world. Hence, despite these issues towards the end, I would absolutely recommend this course to anyone who wants to start learning backend development.",
    "id": 11
  },
  {
    "reviewerName": "Shyam Nath R S.",
    "reviewerText": "As always with Jonas's other courses like JS, HTML and CSS I understood"
  },
  ...
] Using GPT Suppose you don’t have an account. and create one. Sign up Copy one of the from the object and paste it into ChatGPT. For a walkthrough, watch the video below. reviewerText https://www.youtube.com/watch?v=fQIWyYHVnbY&embedable=true You should get something similar to this: The suggestions or improvements: Creating the UI in React React is a JavaScript library used by developers for building user interfaces with reusable components. Now that we have the reviews and suggestions let’s create the UI to display the data. In the project, create a new folder called in the src directory with the following files: reviews components .
└── reviews
    └── src
        └── components
            ├── Footer.jsx
            ├── ImproveSuggestion.jsx
            ├── ReviewImprovementSuggestions.jsx
            ├── Reviews.jsx
            └── Text.jsx Also, let’s create a file for the responses from in an array of objects called in a folder named , ****as shown: GPT reviews.js data src/data/reviews.js .
└── reviews
    └── src
        └── data
            └── reviews.js Get the entire data in this gist gist Let’s update the code in the project accordingly: Footer.jsx const Footer = () => {
  return (
    <>
      <footer className='mt-auto'>
        <div className='mt-5 text-center text-gray-500'>
          <address>
            Built by
            <span className='text-blue-600'>
              <a href='https://twitter.com/terieyenike' target='_'>
                Teri
              </a>
            </span>
            &copy; 2023
          </address>
          <div>
            <p>
              Fork, clone, and star this
              <a
                href='https://github.com/Terieyenike/'
                target='_'
                rel='noopener noreferrer'
                className='text-blue-600'>
                <span> repo</span>
              </a>
            </p>
          </div>
          <p className='text-sm'>Bright Data ．GPT ．React ．Tailwind CSS</p>
        </div>
      </footer>
    </>
  );
};

export default Footer; Change the values in the JSX if you so desire. ImproveSuggestion.jsx const ImproveSuggestion = ({ suggestion }) => {
  return (
    <div>
      <li className='mt-2'>{suggestion}</li>
    </div>
  );
};

export default ImproveSuggestion; ReviewImprovementSuggestions.jsx import ImproveSuggestion from "./ImproveSuggestion";

const ReviewImprovementSuggestions = ({ suggestions }) => {
  return (
    <div>
      <h3 className='text-xl font-bold mt-3'>Improvement Suggestions:</h3>
      <ul className='list-disc'>
        {suggestions.map((suggestion, index) => (
          <ImproveSuggestion key={index} suggestion={suggestion} />
        ))}
      </ul>
    </div>
  );
};

export default ReviewImprovementSuggestions; Reviews.jsx import ReviewImprovementSuggestions from "./ReviewImprovementSuggestions";

const Reviews = ({ reviewerName, reviewText, improvementSuggestions }) => {
  return (
    <div className='mb-8'>
      <h3 className='text-xl font-bold'>
        <span>Reviewer name:</span>
      </h3>
      <p className='mb-3'>{reviewerName}</p>
      <h3 className='text-xl font-bold'>
        <span>Review:</span>
      </h3>
      <p>{reviewText}</p>
      {improvementSuggestions && (
        <ReviewImprovementSuggestions suggestions={improvementSuggestions} />
      )}
    </div>
  );
};

export default Reviews; Text.jsx const Text = () => {
  return (
    <>
      <div className='bg-emerald-800 text-slate-50 p-5 mb-10'>
        <h1 className='text-2xl font-bold md:text-4xl'>
          Using Scraping Browser and GPT for actionable product insights.
        </h1>
        <p className='text-sm mt-3 md:text-xl'>
          Extract reviews from a specific product page{" "}
          <span className='font-bold'>Udemy</span> using Bright Data, Scraping
          Browser and GPT to analyze them to offer business insights.
        </p>
      </div>
    </>
  );
};

export default Text; Some of the code snippets in the components above result from props drilling from one component to the other. Check out to learn more. React documentation The React UI will still display the default boilerplate template in the browser. To show the current changes made to the files in the components, let’s update the entry point of the project, , with this code: App.jsx src/App.jsx import Reviews from "./components/Reviews";
import Text from "./components/Text";
import Footer from "./components/Footer";

import { reviews } from "./data/reviews";

import "./App.css";

function App() {
  return (
    <>
      <div className='flex flex-col container mx-auto max-w-6xl w-4/5 py-8 min-h-screen'>
        <Text />
        {reviews.map((review) => (
          <Reviews
            key={review.id}
            reviewerName={review.reviewerName}
            reviewText={review.reviewText}
            improvementSuggestions={review.improvementSuggestions}
          />
        ))}
        <Footer />
      </div>
    </>
  );
}

export default App; Starting the development server will display the project like this: Conclusion Because it avoids website bans and works seamlessly with libraries like Puppeteer, Bright Data Scraping Browser is an excellent option for developers that need to deliver high-quality scraped data. Scraping the web presents difficulties, as accessing a company's endpoints may result in blocking. For this reason, preventive measures like CAPTCHAs and other techniques exist to safeguard user data. In this lesson, you gained insight into inspecting a webpage element and extracting the necessary data using Node.js to gather user information from Udemy and store it in a JSON file. The project's final step was using GPT to provide insightful information and show the outcome in a user interface. Finally, using these services and tools can serve brands, companies, or individuals on ways to adequately align their products to meet customer expectations. For the case study, GPT provided ways to improve and make the course suitable for learners. Web pages are encouraged to allow comments in the form of reviews from actual product users, which would help give a critical analysis using GPT technology. Udemy Try the today! Scraping Browser https://www.youtube.com/watch?v=YzoLTalL6Uo&embedable=true Resources Getting started with Scraping Browser Puppeteer documentation Tailwind installation for Vite project Scraping Browser

Convert Design To code Integrating Appwrite Cloud With Dhiwise

How to Scrape Large Datasets at Scale

Portfolio

Nominated for 2022 - HackerNoon Contributor of the Year - Data Visualization

Nominated for 2022 - HackerNoon Contributor of the Year - Heroku

Nominated for 2022 - HackerNoon Contributor of the Year - Javascript

Nominated for 2022 - HackerNoon Contributor of the Year - Frontend

Nominated for 2022 - Remote Work Warrior

Nominated for 2022 - No No No Nodejs

Too Long; Didn't Read

Using Scraping Browser and GPT for Actionable Product Insights

Using Scraping Browser and GPT for Actionable Product Insights

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

2021: Reviewing and Kaizen-ing My Programming and Writing Life

How to Scrape Large Datasets at Scale

Web scraping using a headless browser in NodeJS

Meet Bright Data: HackerNoon Company of the Week

Building LLMs with the Right Data Mix

2021: Reviewing and Kaizen-ing My Programming and Writing Life

How to Scrape Large Datasets at Scale

Web scraping using a headless browser in NodeJS

Meet Bright Data: HackerNoon Company of the Week

Building LLMs with the Right Data Mix

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps