In this post, we are going to learn web scraping with python. Using python we are going to Scrape websites like Walmart, eBay, and Amazon for the pricing of Microsoft Xbox One X 1TB Black Console. Using that scraper you would be able to scrape pricing for any product from these websites. As you know I like to make things pretty simple, for that, I will also be using a web scraper which will increase your scraping efficiency.
Why this tool? This tool will help us to scrape dynamic websites using millions of rotating residential proxies so that we don’t get blocked. It also provides a captcha clearing facility. It uses headerless chrome to scrape dynamic websites.
Generally, web scraping is divided into two parts:
Beautiful Soup is a Python library for pulling data out of HTML and XML files.
Requests allow you to send HTTP requests very easily.
Proxy API for web scraping to extract the HTML code of the target URL.
Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x.
mkdir scraper
pip install beautifulsoup4
pip install requests
Now, create a file inside that folder by any name you like. I am using scraping.py.
Firstly, you have to sign up for the scrapingdog API. It will provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. like this.
from bs4 import BeautifulSoup
import requests
We are going to Scrape Xbox pricing from Walmart, eBay & Amazon.
Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the target URL from Walmart, eBay & Amazon to get the raw HTML data. If you are not familiar with the scraping tool, I would urge you to go through its documentation.
We will use requests to make an HTTP GET request.
ebay = requests.get(“https://api.scrapingdog.com/scrape?api_key=<Your-API-key>&url=https://www.ebay.com/itm/Microsoft-Xbox-One-X-1TB-Black-Console/153480514383?epid=238382386&hash=item23bc26cb4f:g:AX8AAOSwk~xcjnHL").text
amazon = requests.get(“https://api.scrapingdog.com/scrape?api_key=<Your-API-key>&url=https://www.amazon.com/Microsoft-Xbox-One-Console-Wireless-Controller/dp/B07WDGB9P5/ref=sr_1_2?dchild=1&keywords=xbox&qid=1589211220&sr=8-2").text
walmart = requests.get(“https://api.scrapingdog.com/scrape?api_key=<Your-API-key>&url=https://www.walmart.com/ip/Microsoft-Xbox-One-X-1TB-Console-Black-CYV-00001/276629190").text
this will provide you with an HTML code of those target URLs.
Now, you have to use BeautifulSoup to parse HTML.
soupEbay = BeautifulSoup(ebay,’lxml’)
soupAmazon = BeautifulSoup(amazon,’lxml’)
soupWalmart = BeautifulSoup(walmart,’lxml’)
Now, the eBay price is stored in a “span” tag with class “notranslate”, similarly Amazon price is stored in “span” tag with class “a-size-medium a-color-price priceBlockBuyingPriceString” and Walmart price is stored in a “span” tag with class “price-group”
Then declare an empty list and dictionary to generate a JSON object of the prices
l={}
u=list()
Then we will use variable soupEbay, soupAmazon and soupWalmart to get the prices by specifying the tags as mentioned above. Along with that we will use find function of BeautifulSoup.
try:
l[“priceEbay”] = soupEbay.find(“span”,“class”:”notranslate”}).text.replace(“US “,””)
except:
l[“priceEbay”] = None
try:
l[“priceAmazon”] = soupAmazon.find(“span”,{“class”:”a-size-medium a-color-price priceBlockBuyingPriceString”}).text
except:
l[“priceAmazon”] = None
# print(soupAmazon.find(“div”,{“class”:”a-section a-spacing-small”}))
try:
l[“priceWalmart”] = soupWalmart.find(“span”,{“class”:”price-group”}).text
except:
l[“priceWalmart”] = None
Now the dictionary is ready with the prices of all the vendors. We just have to append it in a list to generate a JSON object.
u.append(l)
print("Xbox pricing",u)
After printing the list u we get a JSON object.
{
“Xbox pricing”: [
{
“priceWalmart”: “$367.45”,
“priceEbay”: “$599.00”,
“priceAmazon”: “$318.00”
}
]
}
Isn’t that amazing. We managed to scrape Walmart, Amazon & eBay in just 5 minutes of setup. We have an array of python Object containing the prices of Xbox. In this way, we can scrape the data from any website without getting BLOCKED.
In this article, we understood how we can scrape data using proxy scraper & BeautifulSoup regardless of the type of website.
Feel free to comment and ask me anything. You can follow me on Twitter. Thanks for reading and please hit the like button! 👍
And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey: