paint-brush
[Nodejs] Web Scraping note (cheerio)by@peterchang_82818
36,325 reads
36,325 reads

[Nodejs] Web Scraping note (cheerio)

by
 HackerNoon profile picture

@peterchang_82818

February 2nd, 2017
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are many real business examples for which <strong>web scraping</strong> is being currently used by businesses. And this is a note about Web Scrapping by Cheerio in Node.js.

Companies Mentioned

Mention Thumbnail
Apple
Mention Thumbnail
Google
featured image - [Nodejs] Web Scraping note (cheerio)
1x
Read by Dr. One voice-avatar

Listen to this story

 HackerNoon profile picture

@peterchang_82818

image

Scrapping And Art

Cheerio module, you will be able to use the syntax of jQuery while working with downloaded web data. Cheerio provides developers with the ability to provide their attention on the downloaded data, rather than on parsing it.

There are many real business examples for which web scraping is being currently used by businesses. And this is a note about Web Scrapping by Cheerio in Node.js.

Load html

var request = require('request');var cheerio = require('cheerio');request('http://www.google.com/', function(err, resp, html) {        if (!err){          const $ = cheerio.load(html);          console.log(html);

Selectors

Example html content:





<ul id="fruits"><li class="apple">Apple</li><li class="orange">Orange</li><li class="pear">Pear</li></ul>


$('.apple', '#fruits').text()//=> Apple


$('ul .pear').attr('class')//=> pear


$('li[class=orange]').html()//=> <li class = "orange">Orange</li>

Traversing

find(selector)

Get a set of descendants filtered by selector of each element in the current set of matched elements.


$('#fruits').find('li').length//=> 3

.parent()

Gets the parent of the first selected element.


$('.pear').parent().attr('id')//=> fruits

.next()

Gets the next sibling of the first selected element.


$('.apple').next().hasClass('orange')//=> true

.prev()

Gets the previous sibling of the first selected element.


$('.orange').prev().hasClass('apple')//=> true

.siblings()

Gets the first selected element’s siblings, excluding itself.


$('.pear').siblings().length//=> 2

.children( selector )

Gets the children of the first selected element.


$('#fruits').children().length//=> 3


$('#fruits').children('.pear').text()//=> Pear

Reference:

http://www.devdungeon.com/content/writing-web-scraper-nodejs

https://cheerio.js.org/

https://www.digitalocean.com/community/tutorials/how-to-use-node-js-request-and-cheerio-to-set-up-simple-web-scraping

https://firebearstudio.com/blog/node-js-best-cms-e-commerce-systems-and-open-source-projects.html

L O A D I N G
. . . comments & more!

About Author

 HackerNoon profile picture
@peterchang_82818

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Donielsmith
Habr
Webscraping
Zhichkinroman

Mentioned in this story

X REMOVE AD