How do you increase productivity, especially when you want to multi-task and achieve so much in so little time? We have all encountered situations when our Lead in an Organization instructs us to get (scrape) information off the internet, especially if we are in a team that does a lot of manual processes to achieve your work. Doing this with a pen and paper can lead to errors and missing out on specific information from the website. This tutorial will demonstrate how to automate scraping data off of the website and using it for whatever purpose. Sandbox You can find the source code of the completed project on . Fork, tweak the scripts, and run the code. CodeSandbox <CodeSandbox title="scrape the web" id="web-scraper-nxmv8" /> Prerequisites A basic understanding of JavaScript is necessary for you to complete this project. This Project is built with Node.js and Express. Also, to follow through the steps, we need to do the following: Have Node.js and NPM installed on our computer. We use , a package manager, to install dependencies for our program npm We will make use of a code editor of our choice NPM is available when you install from the official documentation Node Installation Create a node server with the following command. npm init -y The above command helps to initialize our project by creating a file in the root of the folder using npm with the flag to accept the default. We will install the express package from the npm registry to help us write our scripts to run the server. package.json -y Then after the initialization, we need to install the dependencies , , and . express cheerio axios npm install express cheerio axios , a fast and flexible Node.js web Framework express , a package that parses markup and provides an API for traversing/manipulating the resulting data structure. Cheerio implementation is identical to jQuery. cheerio , a promise-based HTTP client for the browser and node.js. axios Creating a Server With Node.JS In our JavaScript file, we use the following code below to import Express.js, create an instance of the Express application, and finally start the app as an Express server. app.js const express = require('express');
const app = express();

const PORT = process.env.port || 3000;

app.listen(PORT, () => {
  console.log(`server is running on PORT:${PORT}`);
}); Before starting our application in the command line, we need to install as a development dependency. nodemon npm install nodemon --save-dev Nodemon is a monitoring script used during the development of node.js apps. We will configure the file to use nodemon. This allows us to run our app without manually restarting the server. package.json {
  "scripts": {
    "start": "nodemon app.js"
  },
  "devDependencies": {
    "nodemon": "^2.0.15"
  }
} Now start the app in the command line with , which should output this in the command line. npm start server is running on PORT:3000 is suitable for routing, as we will see later on in the tutorial. Express.js Creating the Scraper With the complete server setup, we will implement the web scraper that helps boost your productivity and efficiency at work within minutes. Now in the same file, we will import the package to send HTTP requests to the Representational State Transfer (REST) endpoint to perform CRUD operations. app.js axios const express = require('express');
const axios = require('axios')

const app = express();

const PORT = process.env.port || 3000;

const website = 'https://news.sky.com';

try {
  axios(website).then((response) => {
    const html = response.data;
    console.log(html);
  });
} catch (error) {
  console.log(error, error.message);
}

app.listen(PORT, () => {
  console.log(`server is running on PORT:${PORT}`);
}); From the code snippet above, we use axios. Axios takes in the URL of the website through chaining, and once it has resolved, we get a response from the news website URL in the command line. Scraping the Data To scrape the news website URL data, update our file with the following. The cheerio package will make this possible. app.js const express = require('express');
const axios = require('axios');
const cheerio = require('cheerio');

const app = express();

const PORT = process.env.PORT || 3000;

const website = 'https://news.sky.com';

try {
  axios(website).then((res) => {
    const data = res.data;
    const $ = cheerio.load(data);

    let content = [];

    $('.sdc-site-tile__headline', data).each(function () {
      const title = $(this).text();
      const url = $(this).find('a').attr('href');

      content.push({
        title,
        url,
      });

      app.get('/', (req, res) => {
        res.json(content);
      });
    });
  });
} catch (error) {
  console.log(error, error.message);
}

app.listen(PORT, () => {
  console.log(`server is running on PORT:${PORT}`);
}); Let's go through the code above. The package will enable us to manipulate the DOM by reading the elements on the page. We will target specific elements on the page we need to scrape only. cheerio To parse through the HTML, we make use of to parse all the HTML on the page and save it with a variable, . cheerio.load(data) const $ To find specific elements on the website with a title, we inspect the page and copy the class name for the tag. h3 For each title headline, we want to grab the text using and the link to the headline we find with an attribute of . text() href Now, to scrape all our data in a JSON file, we create an empty array with a variable . With this created array, we need to push the saved and by using the push method in an object to display all the client's scraped data with the method, with an endpoint . content title URL GET app.get / Finally, we execute the block of code within the statements. The statement executes if an exception occurs. try...catch catch With the process completed for scraping a website, we now have the scraped data in JSON format. Summary Now that you've seen how to create a web scraper with Node.js using the Express.js framework, there is no excuse not to try this with any website of your choice while saving time to get accurate data. This post explored scraping a website and how productive you can be with a method you can replicate with as many website URLs. Clone and fork the completed source code . here Further Reading Basic routing with the Express framework What Can You Do Next? To experiment with what we built, you can fetch the data from the server and call it in your frontend application. First Published here

Fetch

Target

9 Productivity Tools for Non-Technical Professionals in 2022

Portfolio

Nominated for 2022 - HackerNoon Contributor of the Year - Data Visualization

Nominated for 2022 - HackerNoon Contributor of the Year - Heroku

Nominated for 2022 - HackerNoon Contributor of the Year - Javascript

Nominated for 2022 - HackerNoon Contributor of the Year - Frontend

Nominated for 2022 - Remote Work Warrior

Nominated for 2022 - No No No Nodejs

Technical content creator

Too Long; Didn't Read

How to Scrape Data From Any Website With JavaScript

How to Scrape Data From Any Website With JavaScript

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

2021: Reviewing and Kaizen-ing My Programming and Writing Life

A Beginner's Guide to HTTP - Part 3: Requests

Axios or Fetch: What Is Better for HTTP Requests?

Axios: What to Do When Something Goes Wrong

How To Build a Simple Blog using Axios With React: Beginners Tutorial

How to Create an Instant Search Input with Debounce in React.js

2021: Reviewing and Kaizen-ing My Programming and Writing Life

A Beginner's Guide to HTTP - Part 3: Requests

Axios or Fetch: What Is Better for HTTP Requests?

Axios: What to Do When Something Goes Wrong

How To Build a Simple Blog using Axios With React: Beginners Tutorial

How to Create an Instant Search Input with Debounce in React.js

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps