LinkedIn is a great place to find leads and engage with prospects. In order to engage with potential leads, you’ll need a list of users to contact. However, getting that list might be difficult because LinkedIn has made it difficult for web scraping tools. That is why I made a script to search Google for potential LinkedIn users and company profiles. Tools Required You’ll need and some packages to get started. Once you have Python installed, you can run the following command to install the necessary packages. Python 2.7+ pip requests install LinkedIn Scraper Script First, we need to import all the packages that we need. These packages are used for randomizing the user-agent and making the requests. Then regex is used to parse out the LinkedIn profiles and links. random argparse requests re import import import import We create a LinkedinScraper class that tracks and hold the data for each of the requests. The class requires two parameters keyword and limit. The keyword parameter specifies the search term. The limit parameter specifies a max amount of links to search for. self.keyword = keyword.replace( , )
        self.all_htmls = self.server = self.quantity = self.limit = int(limit)
        self.counter = : class LinkedinScraper (object) : def __init__ (self, keyword, limit) """

        :param keyword: a str of keyword(s) to search for
        :param limit: number of profiles to scrape
        """ ' ' '%20' "" 'www.google.com' '100' 0 We create a LinkedinScraper class that tracks and hold the data for each of the requests. The class requires two parameters keyword and limit. The keyword parameter specifies the search term. The limit parameter specifies a max amount of links to search for. The LinkedinScraper class has three main functions, , , and . search parse_links parse_people The function will perform the requests based on the keywords. It first generates a URL that is a Google-specific query based on the and . Then it makes the requests and saves all the HTML into search keyword limit self.all_htmls user_agents = [ , , , , , ] self.counter &lt; self.limit:
            headers = { : random.choice(user_agents)}
            url = + str(self.counter) + + self.keyword
            resp = requests.get(url, headers=headers) ( ) resp.text:
                print( ) self.all_htmls += resp.text
            self.counter += : def search (self) """
        perform the search
        :return: a list of htmls from Google Searches
        """ # choose a random user agent 'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36' 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0) chromeframe/10.0.648.205' 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1500.55 Safari/537.36' 'Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6' 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/11.10 Chromium/18.0.1025.142 Chrome/18.0.1025.142 Safari/535.19' 'Mozilla/5.0 (Windows NT 5.1; U; de; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 Opera 11.00' while 'User-Agent' 'http://google.com/search?num=100&amp;start=' '&amp;hl=en&amp;meta=&amp;q=site%3Alinkedin.com/in%20' if "Our systems have detected unusual traffic from your computer network." in "Running into captchas" return 100 The function will search the HTML and perform regex parsing to extract out all the LinkedIn links. parse_links reg_links = re.compile( )
        self.temp = reg_links.findall(self.all_htmls)
        results = [] regex self.temp:
            final_url = regex.replace( , )
            results.append( + final_url) results : def parse_links (self) r"url=https:\/\/www\.linkedin.com(.*?)&amp;" for in "url=" "" "https://www.linkedin.com" return Similarly, parse_people function will search the HTML for their name and title. def parse_people(self): reg_people = re.compile( )
        self.temp = reg_people.findall(self.all_htmls)
        print(self.temp)
        results = [] iteration (self.temp): = iteration.replace( , ) = .replace( , ) = .replace( , ) = .replace( , ) = .replace( , ) = .replace( , ) = .strip( ) != :
                results.append( )
        return results """

        :param html: parse the html for Linkedin Profiles using regex
        :return: a list of
        """ r'"&gt;[a-zA-Z0-9._ -]* -|\| LinkedIn' for in delete ' | LinkedIn' '' delete delete ' - LinkedIn' '' delete delete ' profiles ' '' delete delete 'LinkedIn' '' delete delete '"' '' delete delete '&gt;' '' delete delete "-" if delete " " delete This is an example of using the class to search for 500 profiles for the Tesla company. This is quite a simple script, but it should be a good starting point. It is missing some error and captcha handling when making too many requests to Google. I recommend using a Google Search API such as to perform unlimited searches. Or use API to perform the search with any language. https://goog.io RapidAPI Google Search You can find the full code at https://github.com/googio/linkedin_scraper.git This code is fast. Making too many requests to Google will result in getting your IP blocked. Please use proxies when running this script. Or check out goog.io API docs on performing searches without worrying about getting blocked https://goog.io/docs Also published here

Google

Mozilla

Tesla

How To Easily Validate Startup Ideas

Nominated for 2022 - Pythonistas Paradise

Nominated for 2022 - Seo Sleuth

Nominated for 2022 - HackerNoon Contributor of the Year - Api

Nominated for 2022 - Most Valuable Marketer

Too Long; Didn't Read

Running a Python Script to Scrape LinkedIn Profiles From Google

Running a Python Script to Scrape LinkedIn Profiles From Google

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

How To Easily Validate Startup Ideas

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Threats to an Open API Ecosystem

10 Indications That You Should Invest in Automation Via APIs

10 Best Practices for Securing Your API

The Noonification: Getting Your API Into Production (10/28/2022)

How To Easily Validate Startup Ideas

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Threats to an Open API Ecosystem

10 Indications That You Should Invest in Automation Via APIs

10 Best Practices for Securing Your API

The Noonification: Getting Your API Into Production (10/28/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps