The internet is a treasure trove of information, and sometimes, you might want to access the underlying HTML source code of a website for various purposes like data extraction, analysis, or automation. Python, with its rich ecosystem of libraries, makes web scraping a straightforward task.
In this article, we'll explore how to use Python to fetch the source code of a website.
To follow this article, you will need the following:
Before we dive into web scraping, ensure you have Python installed on your system. You can download the latest version from
The "Requests" library in Python is a popular and widely used library for making HTTP requests to web services, websites, and APIs. It simplifies the process of sending HTTP requests and handling HTTP responses, making it easier for developers to interact with web resources.
To install the "Requests" library in Python, you can use the Python package manager pip
. Here are the steps to install Requests:
pip install requests
Now that we have Requests installed, let's write a simple Python script to retrieve the source code of a website.
import requests
# Define the URL of the website you want to scrape
url = 'https://example.com'
# Send an HTTP GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Print the HTML source code
print(response.text)
else:
print('Failed to retrieve the webpage. Status code:', response.status_code)
In this script:
import requests
.url
variable with the URL of the website we want to scrape. You can replace 'https://example.com'
with the URL of your target website.requests.get(url)
to send an HTTP GET request to the specified URL and store the response in the response
variable.response.text
.
The following example shows how to use the above code to get the source code of the Google homepage:
import requests
# Define the URL of the website you want to scrape
url = 'https://google.com'
# Send an HTTP GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Print the HTML source code
print(response.text)
else:
print('Failed to retrieve the webpage. Status code:', response.status_code)
Output:
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/WebPage" lang="en">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<meta content="width=device-width, initial-scale=1.0" name="viewport">
<title>Google</title>
...
</head>
<body>
...
</body>
</html>
Web scraping with Python's Requests library is a straightforward and effective way to access the source code of a website. However, it's essential to be aware of website terms of service and legal considerations when scraping websites. Always ensure that your web scraping activities are ethical and comply with the website's policies. With Requests, you have a powerful tool at your disposal to gather data, automate tasks, and explore the vast world of web content. Happy scraping!
Reference: