In this article, we'll go through scraping a Wikipedia table with COVID-19 data using Puppeteer and Node. The original article that I used for this project is located . here I have never scraped a website before. I've always seen it as a hacky thing to do. But, after going through this little project, I can see the value of something like this. Data is hard to find and if you can scrape a website for it, in my opinion, by all means, do it. Setup Setting up this project was extremely easy. All you have to do is install Puppeteer with the command . There was one confusing issue I had during setup, however. The puppeteer package was not unzipped correctly when I initially installed it. I found this out while running the initial example in the article. If you get an error that states or something similar follow these steps: npm install puppeteer Failed to launch browser process Unzip from chrome-win node_modules/puppeteer/.local-chromium/ Then add that folder to the folder in that same . win64 .local-chromium folder Make sure the is in this path chrome.exe node_modules/puppeteer/.local-chromium/win64-818858/chrome-win/chrome.exe This is for windows specifically. Mac might be similar, but not sure. Here is the that lead me to the answer. It might be a good idea to do this no matter what to make sure everything is functioning properly. link The code I had to make a couple of small changes to the existing code. First example The first example didn't work for me. To fix the problem I assigned the async function to a variable then invoked that variable after the function. I'm not sure this is the best way to handle the issue but hey, it works. Here is the code: puppeteer = ( ); takeScreenShot = () => { browser = puppeteer.launch(); page = browser.newPage(); page.goto( ); page.screenshot({ : }); browser.close(); }; takeScreenShot(); const require 'puppeteer' const async const await const await await 'https://www.stem-effect.com/' await path 'output.png' await Wikipedia scraper I also had an issue with the Wikipedia scraper code. For some reason, I was getting null values for the country names. This screwed up all of my data in the JSON file I was creating. Also, the scraper was 'scraping' every table on the Wikipedia page. I didn't want that. I only wanted the first table with the total number of cases and deaths caused by COVID-19. Here is the modified code I used: puppeteer = ( ); fs = ( ) scrape = () =>{ browser = puppeteer.launch({ : }); page = browser.newPage(); page.goto( , { : }) recordList = page.$$ ( ,(trows)=>{ rowList = [] trows.forEach( { record = { : , : , : , : } record.country = row.querySelector( ).innerText; tdList = .from(row.querySelectorAll( ), column => column.innerText); record.cases = tdList[ ]; record.death = tdList[ ]; record.recovered = tdList[ ]; (tdList.length >= ){ rowList.push(record) } }); rowList; }) .log(recordList) browser.close(); fs.writeFile( , .stringify(recordList, , ),(err)=>{ (err){ .log(err)} { .log( )} }) }; scrape(); const require 'puppeteer' const require 'fs' const async const await headless false //browser initiate const await // opening a new blank page await 'https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic_by_country_and_territory' waitUntil 'domcontentloaded' // navigate to url and wait until page loads completely // Selected table by aria-label instead of div id const await eval '[aria-label="COVID-19 pandemic by country and territory table"] table#thetable tbody tr' let => row let 'country' '' 'cases' '' 'death' '' 'recovered' '' 'a' // (tr < th < a) anchor tag text contains country name const Array 'td' // getting textvalue of each column of a row and adding them to a list. 0 1 2 if 3 return console // Commented out screen shot here // await page.screenshot({ path: 'screenshots/wikipedia.png' }); //screenshot // Store output 'covid-19.json' JSON null 2 if console else console 'Saved Successfully!' I wrote comments on the subtle changes I made, but I'll also explain them here. First, instead of identifying the table, I wanted to use by the , I pinpointed the table with the aria-label. This was a little more precise. Originally, the reason the code was scraping over all of the tables on the page was because the IDs were the same (I know, not a good practice. That's what classes are for, right?). Identifying the table via aria-label helped ensure that I only scraped the exact table I wanted, at least in this scenario. div#covid19-container Second, I commented out the screenshot command. It broke the code for some reason and I didn't see the need for it if we were just trying to create a JSON object from table data. Lastly, after I obtained the data from the correct table I wanted to actually use it in a chart. I created an HTML file and displayed the data using Google charts. You can see the full project on my if you are curious. Fair warning, I got down and dirty (very hacky) putting this part together, but at the end of the day, I just wanted an easier way to consume the data that I had just mined for. There could be a whole separate article on the amount of refactoring that can be done on my HTML page. Github Conclusion This project was really fun. Thank you to the author, Mohit Maithani, for putting it together. It opened my eyes to the world of web scraping and a whole new realm of possibilities! At a high level, web scraping enables you to grab data from anywhere you want. Like one of my favorite Youtubers, Ben Sullins likes to say, "When you free the data, your mind will follow". Love y'all. Happy coding! Also published at https://dev.to/tyry327/scraping-wikipedia-for-data-using-puppeteer-and-node-1f0l