How to Scrape Data from Google Maps Using Python

Written by sic | Published 2023/06/23
Tech Story Tags: python | web-scraping | web-scraping-with-python | data-extraction | google-maps | data-science | software-development | software-engineering

TLDRPython is a convenient tool for scraping data, with many libraries specifically designed for this purpose. Choosing the right library is an important step in building a scraper and depends on the project requirements and programming skills. In this article, we will explore three scraping options and discuss the following google maps scraping Python libraries.via the TL;DR App

With its extensive database of locations, business listings, reviews, and more, Google Maps is a frequent source for data retrieval. However, extracting this data manually can be a tedious and time-consuming task.

In this article, we'll look at how to use Python to scrape data from Google Maps, allowing you to gather the information you need efficiently and effectively. So, whether you're a data scientist, a business professional, or a curious individual, join us at this tutorial to learn how to use Python to scrape addresses from stores on Google Maps and other information.

Setting Up the Environment

First, download and install the Python interpreter. To do this, go to the official website, download the latest version of Python available, and run the installation file. Be sure to select the PATH-adding item during installation.

Python is a convenient tool for scraping data, with many libraries specifically designed for this purpose. Regarding scraping data from websites like Google Maps, Python provides a range of libraries and frameworks that can assist in the process. Choosing the right library is an important step in building a scraper and depends on the project requirements and programming skills.

This article covers three scraping options and discusses the following google maps scraping Python libraries:

  1. Scraping using the Requests and BeautifulSoup libraries. This option has its own pros and cons. It is a simple approach suitable for beginners that require low skills. But it cannot emulate user behavior and does not handle JavaScript rendering.

  2. Scraping using a headless browser (Selenium library). This option requires more advanced programming skills and the additional download of a webdriver. It allows you to interact with the webpage as a user would, including handling JavaScript-rendered content.

  3. Scraping using the Google Maps API library. Using a purpose-built library for scraping Google Maps offers a good solution and is the most convenient option. It resolves the issues the previous two methods faced, such as bypassing CAPTCHA challenges and avoiding the need for proxies. Moreover, you can use a no-code Google Maps data scraper, which doesn’t require programming skills.

    To install all the required libraries, open the command prompt and execute the following commands:

pip install beautifulsoup4
pip install selenium
pip install sc-google-maps-api

The Requests library, which we will also be using, is pre-installed and does not require installation. We must also install a web driver to use the headless browser.

The choice of WebDriver depends on the browser you want to automate. For example, if you want to automate Chrome, you need the Chrome driver. Ensure that the version of the WebDriver you download matches the version of your installed browser.

To install the WebDriver, visit the official website of the WebDriver and download the WebDriver. We recommend saving it in a location that is easily accessible, such as the C:// drive. Avoid using complex paths, as we will need the location later when specifying it in the script.

Inspecting Google Maps Web Pages

Before we scrape, we need to understand where and in what form the data is. To do this, let's go to google maps and try to find any location. After that, open DevTools (F12, or right-click on the screen and select Inspect).

If you take a closer look at this page, you will notice that all the page headers and their descriptions have a class with a unique name that is generated anew each time. However, each element also has a class called "article," making it easy to obtain the data. Moreover, as we can observe, the headers have the "fontHeadlineSmall" class, while the descriptions are stored within a span tag.

Scraping Data Using Python

Let's start by looking at an example of using the Google Maps API, as it requires minimal programming skills. First, sign up at Scrape-It.Cloud to get an API key and some free credits. To get data, you can use this script:

from sc_google_maps_api import ScrapeitCloudClient
client = ScrapeitCloudClient(api_key='INSERT_YOUR_API_KEY_HERE')
response = client.scrape(
    params={
        "keyword": "cafe in new york",
        "country": "US",
        "domain": "com"
    }
)


print(response.text)

Now, if you save and execute this script, you will receive the following data:

In other words, you will receive comprehensive data about the first 20 positions, already structured in JSON format.

Now let's make the task more challenging and try to fetch the data using the Requests and BeautifulSoup libraries. To accomplish this, we will import the required libraries, send a request to the page, and then parse the obtained response:

import requests
from bs4 import BeautifulSoup
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
data = requests.get('https://www.google.com/maps/search/cafe+in+new+york/', headers=header)
soup = BeautifulSoup(data.text, "html.parser")
titles = soup.find_all('div', {'class':'fontHeadlineSmall'})
descriptions = soup.find_all('div',  {'class':'fontBodyMedium'})
print(titles)
print(descriptions)

However, we won't obtain any data because these libraries work excellently with static web pages but are unsuitable for scraping dynamically generated pages like Google Maps.

And finally, another option is to use Selenium with a headless browser, which allows us to simulate user behavior and thus scrape dynamic data. This enables us to scrape not only static data but also dynamic content that is generated or modified through JavaScript interactions.

Let's import the necessary libraries and set the path to the web driver:

from selenium import webdriver
from selenium.webdriver.common.by import By
DRIVER_PATH = 'C:\chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)

Next, we will provide the link to navigate to. It is important to note that all links in Google Maps follow a standardized structure. This enables us to enhance the code in the future by automatically generating links and extracting keywords, for example, from a file.

driver.get('https://www.google.com/maps/search/cafe+in+new+york/')

Now let's create a variable to store the data in title-description pairs.

results =[]

Next, we need to fetch all the elements. As we recall from the structure, all the elements have the "article" role attribute.

elems = driver.find_elements(By.XPATH, "//div[@role='article']")

Now we only need to iterate through each element in the sequence and collect the required data, which needs to be stored in the results variable:

for elem in elems:
    title = elem.find_element(By.CSS_SELECTOR, "div.fontHeadlineSmall")
    description = elem.find_element(By.CSS_SELECTOR, "div.fontBodyMedium")
    results.append(str(title.text)+';'+str(description.text))

Finally, don't forget to close the driver, and for convenience, let's display the data on the screen:

driver.close()
print(results)

By executing this script, the browser will open, navigate to the Google Maps page, and perform any necessary interactions or data gathering. Upon completion, we will get the following data:

As we can see, we only retrieve data for 5 elements, which is significantly fewer than using the API. Adding scrolling functions and delays to your script will help to load new data.

Storing the Scraped Data

Previously, we displayed the retrieved data in the console. However, to make the example more complete, let's save the data obtained in the last script to a file.

We can use Python's file-handling capabilities to save the data to a file. Here's an example of how you can modify the script to write the data to a file:

with open("maps.csv", "w") as f:
    f.write("Title; Description\n")
for result in results:
    with open("maps.csv", "a", encoding="utf-8") as f:
        f.write(str(result)+"\n")

In other words, we start by creating a file called maps.csv. If the file already exists, we delete the old file and create a new one with the column names "Title" and "Description". Then, we iterate through the results stored in the 'results' variable line by line and add them to the file.

At this stage, it is usually about saving the data and performing data cleaning. For example, this may involve removing unnecessary characters, correcting data errors, or eliminating empty strings. This step is crucial because unprocessed data may not be suitable for further analysis.

Additionally, it is important to mention that data cleaning helps improve data quality and ensures the accuracy and reliability of the analysis. We can obtain meaningful insights from the data by removing inconsistencies and irrelevant information.

Conclusion

Python provides powerful tools and libraries for scraping data from Google Maps efficiently and effectively. By scraping data from Google Maps, you can gain insight into local businesses, analyze customer reviews and ratings, gather contact information, and more. Whether you're a business owner looking for competitive intelligence or an enthusiast looking for interesting patterns and trends, scraping data from Google Maps can provide you with a wealth of information.

This article explored three different scraping options using popular libraries such as Requests and BeautifulSoup, Selenium, and the Google Maps API library. Each option has its own advantages and considerations depending on project requirements and programming skills.

Furthermore, we demonstrated examples of scraping data from Google Maps using Python. We used the Google Maps API to retrieve structured data easily and quickly. We also explored scraping Google Maps with Requests and BeautifulSoup libraries, although they are unsuitable for dynamically generated pages like Google Maps. Lastly, we utilized Selenium with a headless browser to simulate user behavior and scrape dynamic content.


Written by sic | A marketer who's always on brand, on budget, and on the hunt for his next caffeine fix.
Published by HackerNoon on 2023/06/23