I love SQL and it never ceases to amaze me what can be accomplished via the power of SQL syntax. In this example, we are going to create a simple wrapper that treats any Web page or HTTP endpoint as a SQL table that can be queried via a single SQL select statement. Furthermore we will demonstrate how we can takes the results of our query and export it into a variety of useful formats : (JSON,TEXT, TAB/CSV Delimited, EXCEL, etc). This example leverages one of my favourite applications, PhantomJS. _Edit description_phantomjs.org PhantomJS | PhantomJS PhantomJS is a headless WebKit scriptable with a API that enables such functionality as: JavaScript HEADLESS WEBSITE TESTING Run functional tests with frameworks such as Jasmine, QUnit, Mocha, Capybara, WebDriver, and many others. Learn more SCREEN CAPTURE Programmatically capture web contents, including SVG and Canvas. Create web site screenshots with thumbnail preview. Learn more PAGE AUTOMATION Access and manipulate webpages with the standard DOM API, or with usual libraries like jQuery. Learn more NETWORK MONITORING Monitor page loading and export as standard HAR files. Automate performance analysis using YSlow and Jenkins. Learn more In conjunction with PhantomJS, we are also going to leverage the awesome JavaScript library- from: SQL ALASQL _AlaSQL was written with pure JavaScript and does not use browser WebSQL database._alasql.org AlaSQL - javascript SQL database library — ‘à la SQL’ — is a lightweight client-side in-memory SQL database designed to work in browser and Node.js. AlaSQL was written with pure JavaScript and does not use browser WebSQL database. AlaSQL is fully functional compact sql server with JOINs, GROUPs, UNIONs, ANY, ALL, IN, subqueries and very limited transactions support. AlaSQL supports ROLLUP(), CUBE() and GROUPING SETS() functions AlaSQL works with all modern versions of browsers (Chrome, Firefox, IE, Safari), Node.js, and mobile iOS and Android. AlaSQL is fast, because it uses some optimization methods. AlaSQL The code to perform our task is as follows: To make this work, you first need to install PhantomJS. The instructions are here: _Note: For this static build, the binary is self-contained. There is no requirement to install Qt, WebKit, or any other…_phantomjs.org Download | PhantomJS Once you’ve successfully downloaded PhantomJS, create a new text file called scrapepage.js using the above code. From your command prompt, you can then start using the application by simply typing in a command like the following: phantomjs scrapepage.js " " "select nodeName,href,innerText from document where className='storylink' limit 5" https://news.ycombinator.com/news this should produce a list something like the following: or you could try something like the following: phantomjs scrapePage.js " " "select nodeName,count(nodeName) as Nodes INTO CSV() from document group by nodeName order by Nodes desc" https://news.ycombinator.com/news Notice that you can take advantage of most of the options of ALASQL, described here: _alasql - AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and…_github.com agershun/alasql You can obviously pipe the results of your query into a file as follows: phantomjs scrapepage.js "url" "sql" > somefile.json For example, let’s create an Excel spreadsheet by typing: phantomjs scrapepage.js " " "select nodeName,href,innerText INTO XLSXML({headers:true}) from document where className='storylink'" > mydata.xls https://news.ycombinator.com/news As you can see, you can pretty much takes the contents of any webpage and convert it into a useful format by executing a single command! Each row in the documents table represents a single node in the HTML Dom (Document Object Model) Tree of the web page. To see all the fields on a particular node, open up the development tools in the browser and take a peek at the nodes in the console : To access child properties on the nodes, use the syntax described here: _alasql - AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and…_github.com agershun/alasql For example, to access properties on the style object, you could specify: select nodeName,style->border as border from document To call methods, select nodeName,attributes->getNamedItem("href") as href from document This article just scratches the surface of what is possible using PhantomJS in conjunction with ALASQL. I highly recommend reading the documentation on both products to master the art of web page scrapping. Please recommend this article if you liked it!