paint-brush
Web Scrape with Python Using Just 9 Lines of Codeby@songthamtung
2,642 reads
2,642 reads

Web Scrape with Python Using Just 9 Lines of Code

by Songtham TungOctober 2nd, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Scraping is extracting data from websites. In this article, I will show you how to scrape links from a test e-commerce site with Python 3. If you haven't done so already, install beautifulsoup4 and requests.com. The output for data should be something similar to: "That's it. That's it, scraping is great and can save you plenty of time. The examples above are used for you to quickly get started. Of course there's more to it than what I showed above. This is only the tip of the iceberg.
featured image - Web Scrape with Python Using Just 9 Lines of Code
Songtham Tung HackerNoon profile picture

Scraping is extracting data from websites. In this article, I will show you how to scrape links from a test e-commerce site with Python 3.

Prerequisites

If you haven't done so already, install beautifulsoup4 and requests.

pip install beautifulsoup4
pip install requests

Start Scraping!

import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.webscraper.io/test-sites/e-commerce/allinone")
soup = BeautifulSoup(result.content)
links = soup.find_all("a", "title")
data = {}
for link in links:
    title = link.string
    data[title] = link.attrs['href']

Here is the full snippet that you can copy and paste directly to your terminal, favorite text editor, or jupyter notebook.

To check if you did it correctly, the output for data should be something similar to:

{'MSI GL62VR 7RFX': '/test-sites/e-commerce/allinone/product/326''Dell Vostro 15…': '/test-sites/e-commerce/allinone/product/283''Dell Inspiron 17…': '/test-sites/e-commerce/allinone/product/296'}

That's it

Web scraping is great and can save you plenty of time when you want to quickly extract data from websites. The examples above are used for you to quickly get started. Of course there's more to it than what I showed above e.g. (crawling, pagination, viewing the DOM, authentication, cookies, etc). This is only the tip of the iceberg 😉.

Thanks for reading! Originally published on The Startup.