In this post, we are going to scrape data from Linkedin using Python and a . We are going to extract Company Name, Website, Industry, Company Size, Number of employees, Headquarters Address, and Specialties. Web Scraping Tool This tool will help us to scrape dynamic websites using millions of rotating residential proxies so that we don’t get blocked. It also provides a captcha clearing facility. Why this tool? Procedure Generally, web scraping is divided into two parts: Fetching data by making an HTTP request. Extracting important data by parsing the HTML DOM Libraries & Tools is a Python library for pulling data out of HTML and XML files. Beautiful Soup allow you to send HTTP requests very easily. Requests provide fast, flexible, and expressive data structures Pandas to extract the HTML code of the target URL. Web Scraper Setup Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x. pip install beautifulsoup4 pip install requests pip install pandas mkdir scraper Now, create a file inside that folder by any name you like. I am using scraping.py. Firstly, you have to sign up for . It will provide you with 1000 FREE credits. Then just import Beautiful Soup & requests in your file. like this. Web Scraper from bs4 import BeautifulSoup import requests import pandas as pd What we are going to scrape We are going to scrape the “about” page of from Linkedin. Google Preparing the Food Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the to get the raw HTML data. If you are not familiar with the scraping tool, I would urge you to go through its . We will use requests to make an HTTP GET request. Now Since we are scraping a company page so I have set “ as company and “ as google/about/. LinkId can be found in Linkedin's target URL. target URL documentation type” linkId” r = requests.get(‘https://api.scrapingdog.com/linkedin/?api_key=YOUR-API-KEY& type =company&linkId=google/about/ ').text This will provide you with an HTML code of those target URLs. Please use your Scrapingdog API key while making above requests. Now, you have to use BeautifulSoup to parse the HTML. l={} soup=BeautifulSoup(r,’html.parser’) u= list () As you can see in the image that the title of the company is stored in ” with tag class “org-top-card-summary__title t-24 t-black truncate h1. So, we’ll use variable soup to extract that text. try : l[“Company”]=soup.find(“h1”,{“ class ”: ”org-top-card-summary__title t- 24 t-black truncate”}).text.replace(“\n”,””) except : l[“Company”]= None I have replaced with an empty string. \n Now, we will focus on extracting website, Industry, Company Size, Headquarters(Address), Type, and Specialties. All of the above properties (except Company Size)are stored in “ ” with tag I will again use variable soup to extract all the properties. class org-page-details__definition-text t-14 t-black — light t-normal dd. allProp = soup.find_all(“dd”,{“ class ”: ”org-page-details__definition-text t- 14 t-black — light t-normal”}) Now, we’ll one by one extract the properties from the list. allProp try : l[“website”]=allProp[ 0 ].text.replace(“\n”,””) except : l[“website”]= None try : l[“Industry”]=allProp[ 1 ].text.replace(“\n”,””) except : l[“Industry”]= None try : l[“Address”]=allProp[ 2 ].text.replace(“\n”,””) except : l[“Address”]= None try : l[“ Type ”]=allProp[ 3 ].text.replace(“\n”,””) except : l[“ Type ”]= None try : l[“Specialties”]=allProp[ 4 ].text.replace(“\n”,””) except : l[“Specialties”]= None Now, we’ll scrape Company Size. As, you can see that Company Size is stored in “ with tag class org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl” dd. try : l[“Company Size”]=soup.find(“dd”,{“ class ”: ”org-about-company-module__company-size-definition-text t- 14 t-black — light mb1 fl”}).text.replace(“\n”,””) except : l[“Company Size”]= None Now, I will push dictionary to list And then we’ll create a dataframe of list u using pandas. l u. df = pd.io.json.json_normalize(u) u.append(l) Now, finally saving our data to a CSV file. df.to_csv(‘linkedin.csv’, index= False , encoding=’utf- 8 ') We have successfully scraped a Linkedin Company Page. Similarly, you can also scrape a Profile. Please read the before scraping a Profile Page. docs Complete Code from bs4 import BeautifulSoup import requests import pandas as pd r = requests.get(‘https://api.scrapingdog.com/linkedin/?api_key=YOUR-API-KEY& type =company&linkId=google/about/ ').text soup=BeautifulSoup(r,’html.parser’) u=list() l={} try: l[“Company”]=soup.find(“h1”,{“class”:”org-top-card-summary__title t-24 t-black truncate”}).text.replace(“\n”,””) except: l[“Company”]=None allProp = soup.find_all(“dd”,{“class”:”org-page-details__definition-text t-14 t-black — light t-normal”}) try: l[“website”]=allProp[0].text.replace(“\n”,””) except: l[“website”]=None try: l[“Industry”]=allProp[1].text.replace(“\n”,””) except: l[“Industry”]=None try: l[“Company Size”]=soup.find(“dd”,{“class”:”org-about-company-module__company-size-definition-text t-14 t-black — light mb1 fl”}).text.replace(“\n”,””) except: l[“Company Size”]=None try: l[“Address”]=allProp[2].text.replace(“\n”,””) except: l[“Address”]=None try: l[“Type”]=allProp[3].text.replace(“\n”,””) except: l[“Type”]=None try: l[“Specialties”]=allProp[4].text.replace(“\n”,””) except: l[“Specialties”]=None u.append(l) df = pd.io.json.json_normalize(u) df.to_csv(‘linkedin.csv’, index=False, encoding=’utf-8' ) print (df) Conclusion In this article, we understood how we can scrape data from Linkedin using & As I said earlier you can scrape a Profile too but just read the docs before trying it. proxy scraper Python. Feel free to comment and ask me anything. You can follow me on . Thanks for reading and please hit the like button! 👍 Twitter Additional Resources And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey: Free Proxy List Datacenter Proxy Web Scraping with Nodejs Scrape Google Search Results Previously published at https://www.scrapingdog.com/blog/scrape-data-from-linkedin-using-python