Scraping is extracting data from websites. In this article, I will show you how to scrape links from a test e-commerce site with Python 3.
Prerequisites
If you haven't done so already, install beautifulsoup4 and requests.
pip install beautifulsoup4
pip install requests
Start Scraping!
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.webscraper.io/test-sites/e-commerce/allinone")
soup = BeautifulSoup(result.content)
links = soup.find_all("a", "title")
data = {}
for link in links:
title = link.string
data[title] = link.attrs['href']
Here is the full snippet that you can copy and paste directly to your terminal, favorite text editor, or jupyter notebook.
To check if you did it correctly, the output for data should be something similar to:
{'MSI GL62VR 7RFX': '/test-sites/e-commerce/allinone/product/326',
'Dell Vostro 15…': '/test-sites/e-commerce/allinone/product/283',
'Dell Inspiron 17…': '/test-sites/e-commerce/allinone/product/296'}
Web scraping is great and can save you plenty of time when you want to quickly extract data from websites. The examples above are used for you to quickly get started. Of course there's more to it than what I showed above e.g. (crawling, pagination, viewing the DOM, authentication, cookies, etc). This is only the tip of the iceberg 😉.
Thanks for reading! Originally published on The Startup.