A web scraper is a tool that allows us to select and transform a website’s unstructured data into a structured database. So where would a web scraper come in handy? I have listed my favorite use cases to get you excited about launching your own! Scrape real estate listings — businesses are using web scraping to gather already listed properties Scrape products/product reviews from retailer or manufacturer websites to show in your site, provide specs/price comparison Scrape news websites to apply custom analysis and curation (manual or automatic), provide better-targeted news to your audience Gathering email addresses for lead generation You can read other handy use cases for a . web scraper here Now Let’s get started! As a simple example — to fetch the titles of links. we’ll scrape the front page of Hacker News If you’re not familiar with Standard Library yet, you’re in for a treat! Standard Library is an API development and publishing platform that can help you build and ship code in record time using the in-browser API editor — Code on Standard Library. Step One: Sign in to Code on Standard Library The first step is to head over to / and create a free account. is an online API editor built by the team at — an embeddable development environment for quickly building APIs, webhooks, and workflow automation tasks. https://code.stdlib.com Code on Standard Library Standard Library On the bottom left corner click . If you have a account click, and sign in using your credentials. A modal will pop up prompting you to claim a namespace (this is your username). Input your e-mail and choose a password. (sign in) Standard Library Already Registered, Standard Library After you create your account, a different module will appear listing the subscription plans. A free account is all you need to get started, but you . can read more about Standard Library’s pricing packages here Once you click you should see a confirmation message pop up. Subscribe + Earn Credits, Click to return to the landing page. Continue Step Two: Select the Web Scraper Sourcecode Select button**.** Sourcecodes are designed to streamline the creation of different types of projects. Sourcecodes provide defaults for things like boilerplate code and directory setup so you can get right to the development and implementation of more complex functionality. API from sourcecode Standard Library You should see a list of published sourcecodes. Scroll down and select . Make sure to enter your desired name for your API and hit (or press enter) @nemo/web -scraper Okay You will then see your endpoint’s code under: functions/__main__.js On the right side you will notice a parameters box. In the URL required parameter type: [https://news.ycombinator.com/](https://news.ycombinator.com/) In the queries type: [[".storylink", "text"]] Select the green “ ” button. Run Within seconds you should have a list of link titles from the front page of under the section of . You will notice a documentation portal — copy and paste the Documentation URL into a new tab in your browser to see your API’s information on Standard Library. Hacker News Results Code on Standard Library How It Works 🤓 The web scraper makes a simple GET request to a URL, and runs a series of queries on the resulting page and returns it to you. It uses the powerful DOM (Document Object Model) processor, enabling us to use to grab data from the page! CSS selectors are patterns used to select the element(s) you want to organize. cheerio CSS-selectors How to Query Using CSS Selectors Web pages are written in s such as HTML is one component of an HTML document or web page. Elements define the way information is displayed to the human eye on the browser- information such as images, multimedia, text, style sheets, scripts etc. markup language An HTML element For this example, we used the “ (class = “.storylink” ) to fetch the titles of all hyperlinks from all elements in the front page of Hacker News. .class” selector If you are wondering how to find the names of the elements that make up a website - allow me to show you! Fire up and type in our URL address . Then right-click on the title of any article and select “ ” This will open the Web Console on Google Chrome. Or you can use ( Google Chrome Hacker News [https://news.ycombinator.com/](https://news.ycombinator.com/) inspect. command key ⌘) + option key (⌥ ) + J key. Right Click and Select Inspect The web-developer console will open to the right of your screen. Notice that when you selected the title of a link a section on the console is also highlighted. The highlighted element has “class” defined as “storylink.” And now you know how to find the names of elements on any site! If you want to query different metadata on , hover your cursor over it. Below you can see how that I found the .class selector = “sitestr” to query a link’s URL by hovering my mouse over that element on Hacker News. Hacker News That’s It, and Thank You! Thanks for reading! I would love for you to , , or follow on Twitter, . Let me know if you’ve built anything exciting that you would like Standard Library team to feature or share — I’d love to help! comment here e-mail me at Janeth [at] stdlib [dot] com Standard Library @StdLibHQ Janeth Ledezma is a Developer Advocate for Standard Library and Berkeley grad— go bears! 🐻 When she isn’t learning the Arabic language, or working out, you can find her riding her CBR500R. 🏍💨 Follow her journey with Standard Library on Twitter @mss_ledezma .

Fetch

Google

How to easily scrape websites for info using Standard Library and Node.js

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Build a Slack Bot in 5 Minutes to Qualify Leads from Typeform with Standard Library and Node.js

The Noonification: Reduce Javascript: Master the Basics (1/11/2023)

The Noonification: Satoshi Might Be a Bitcoin ETF Skeptic (1/13/2024)

The Noonification: Small Puddle of Freedom (11/25/2022)

The Noonification: How to Use AI for Your B2B Marketing (11/11/2022)

105 Stories To Learn About Functional Programming

Build a Slack Bot in 5 Minutes to Qualify Leads from Typeform with Standard Library and Node.js

The Noonification: Reduce Javascript: Master the Basics (1/11/2023)

The Noonification: Satoshi Might Be a Bitcoin ETF Skeptic (1/13/2024)

The Noonification: Small Puddle of Freedom (11/25/2022)

The Noonification: How to Use AI for Your B2B Marketing (11/11/2022)

105 Stories To Learn About Functional Programming

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps