Marc Howard

Building the first sentiment-based customer reviews platform for payments at bizpayo.com

How To Scrape Amazon, Yelp and GitHub Profiles in 30 Seconds

The most talented developers in the world can be found on GitHub. What if there was an easy, fast and free way to find, rank and recruit them? I'll show you exactly how to to this in less than a minute using free tools and a process that I've hacked together to vet top tech talent at BizPayO.
For many of the side-projects I work on collaborating with the best developers in machine learning/AI, blockchain and cryptocurrency are often a matter of checking out the most popular projects on GitHub then helping each other with open-source projects.
This is a short tutorial on how to scrape data on the top developers from GitHub and export quickly into a spreadsheet.
Use Cases:
  • Find the most talented developers to collaborate with (i.e. most followed JavaScript developers in San Francisco, CA)
  • Recruit top developers based on their skillset and collaboration activity
The best part is that these free tools are not limited to just GitHub. You can use them for these other use cases:
  • Finding the best products and prices on Amazon or Ebay
  • Getting business contact details from YP or Yelp (i.e. building an outreach list with highly rated successful businesses)Let’s Begin
In this example we’ll scrape GitHub to find the names, location, and if provided email for the most followed JavaScript developers in San Francisco
You’ll only need two Chrome extensions, Autopagerize and Instant Data Scraper, both free in the Google Chrome Store.
Autopagerize simply allows you to auto-load any paginated website. It works in all major browsers including Firefox, Chrome, Opera and Safari.
Instant Data Scraper is a uses AI to detect tabular or listing type data on web pages. Such data can be scraped into CSV or Excel file, no coding skills required. This extension can also click on the “Next” page links or buttons and retrieve data from multiple pages into one file. Pretty sweet.
Step 1: Download the Autopagerize Chrome plugin. It will allow appending the second, third, etc. search results pages to the bottom of the current page, creating one long page that contains all the results (or as many as you wish). As mentioned above it works in Google search results, GitHub, Amazon, Yelp and several others.
Step 2: Download the Instant Data Scraper plugin.
Step 3: Go to the URL you want to scrape, in this case we’re grabbing the top JavaScript developers in SF on GitHub — again here is the page to start on sorted by followers: https://github.com/search?l=JavaScript&o=desc&p=7&q=stars%3A%3E1000+location%3A%22San+Francisco%22+location%3ACA+followers%3A%3E10+language%3AJavaScript&s=followers&type=Users
Fun fact: The first guy’s last name is pronounced “Boss”-Stock. Pretty bad-ass huh?
Step 4: Click the Autopagerize plugin in your browser then click Next for as many pages as you need. You’ll then have a list similar to the following:
That’s it! You should now have a list of all the data based on the above criteria. 
Again this method can be used for other popular sites to quickly gather, extract and sort the data that you need.
If you’ve found this article helpful please share so that others can find it. If you have any questions feel free to reply or reach out to me directly on Twitter @marcbegins.
Happy scraping!

Tags

Comments

More by Marc Howard

Topics of interest