In this article, you're going to learn the basics of in python and we'll do a to from a website. web scraping demo project scrape quotes What is web scraping? Web scraping is extracting data from a website programmatically. Using web scraping you can the text in , download images & files and almost do anything you do manually with copying and pasting but in a faster way. extract HTML tags Should you learn web scraping? Yeah, absolutely as a programmer in many cases you might need to use the content found on other people's websites but those website doesn't give you API to that, that's why you need to learn web scraping to be able to that. Requirements In order to follow through with this tutorial, you need to have the following libraries installed on your machine Requests BeautifulSoup Installation You can install the above two libraries just by using the pip command as shown below; $ pip install requests $ pip install beautifulsoup4 Basics of requests is an elegant and simple HTTP library for Python, built for human beings, it allows you to send HTTP requests(post, get, put, delete) to a website in an easy way. Requests We gonna use the library while implementing our demo project to send a get a request to the website so as to get its HTML source code. requests Basics of BeautifulSoup is a Python library for pulling data out of HTML and XML files, it comes with parsers that give us a way to navigate within an HTML source code and extract the content we need. Beautiful Soup For us to be able to pull data from our HTML and XML files we need to convert the string representation of the HTML or XML into a BeautifulSoup object which provides us tons of methods to manipulate it. Let's get hands dirty with some code Let use the BeautifulSoup library to extract data from the below HTML file . sample.html Document Time the time before the time times you The Future is now Be who you wanted to be when you're younger The world is reflection of who you're Programming Languages Python C+++ Javascript Golang <!DOCTYPE html> < > head < > title </ > title </ > head < > body < = > div id 'quotes' < = > p id 'normal' </ > p < = > p id 'normal' </ > p < = > p id 'special' </ > p < = > p id 'special' </ > p </ > div < > div < = > p id 'Languages' </ > p < > ul < > li </ > li < > li </ > li < > li </ > li < > li </ > li </ > ul </ > div </ > body </ > html Extracting all paragraphs in HTML Let’s Extract all paragraphs from the shown above using BeautifulSoup: sample.html bs4 BeautifulSoup html = open( ).read() soup = BeautifulSoup(html, ) paragraph soup.find_all( ): print(paragraph.text) from import 'sample.html' 'html.parser' for in 'p' Output When you run the above simple program it will produce the following result: $ python app.py Time the time before the time times you The Future is now Be who you wanted to be when you re Programming Languages 're younger The world is a reflection of who you' Code Explanation importing BeautifulSoup library bs4 BeautifulSoup from import Creating a BeautifulSoup object from HTML string html = open( ).read() soup = BeautifulSoup(html, ) 'sample.html' 'html.parser' The above 2 lines of code are for reading the and creating a Beautifulsoup object ready for parsing data. sample.html Finding all paragraphs and printing them paragraph soup.find_all( ): print(paragraph.text) for in 'p' We used BeautifulSoup method to extract all the paragraph in the HTML file, it accepts a parameter of the name of HTML tag and then it parses through the HTML string to find all tags and returns them. find_all () Extracting all elements in the from the HTML list In extracting the list elements instead of paragraph, we are going to specify tag instead of in the method just as shown below: li p find_all() app.py bs4 BeautifulSoup html = open( ).read() soup = BeautifulSoup(html, ) List soup.find_all( ): print(List.text) from import 'sample.html' 'html.parser' for in 'li' Output $ python app.py Python C+++ Javascript Golang Extracting paragraphs with a specific id Apart from just returning all tags in HTML string, we can also specify the of those tags for us to extract only specific tags. just as shown below: attributes Extract paragraphs with an id of normal requests bs4 BeautifulSoup html = open( ).read() soup = BeautifulSoup(html, ) paragraph soup.find_all( ): paragraph[ ] == : print(paragraph.text) import from import 'sample.html' 'html.parser' for in 'p' if 'id' 'normal' Output $ python app.py Time the time before the time times you The Future now is Demo Project So far we have seen how to extract data from an HTML file that is in our local directory, now let’s go see how we can extract data from the website hosted in the cloud. Quotes spider In this project, we are going to implement a web scraper to scrap quotations from a website of a given URL. We are going to use the requests library to pull the HTML from the website and then parse that HTML using BeautifulSoup. of Interest (WOI) Website In our demo project, we are going to scrap the quotes from quotes.toscrape.com Demo project source code In the source code of our demo project, nothing has changed much other than the fact that this time we gonna obtains the HTML source code from a website using the requests module instead of reading it from the file. import requests bs4 import BeautifulSoup html = requests. ( ). soup = BeautifulSoup(html, ) soup.find_all( ): . : print( . from get 'http://quotes.toscrape.com/' text 'html.parser' for paragraph in 'span' if paragraph string paragraph string Output $ python scraper.py "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking." "It is our choices, Harry, that show what we truly are, far more than our abilities." "There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle." "The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid." "Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring." "Try not to become a man of success. Rather become a man of value." "It is better to be hated for what you are than to be loved for what you are not." "I have not failed. I've just found 10,000 ways that won't work." "A woman is like a tea bag; you never know how strong it is until it's in hot water." "A day without sunshine is like, you know, night." This article was also published . here Hope you found it interesting. Please share it with your fellow developers on Twitter and other dev communities!