As the world is facing the worst pandemic ever, I was just looking at how countries spend on their healthcare infrastructure. So, I thought of doing a data visualization of the medical expense of several countries. My search led to , which has data from many countries for the year 2016. I did not found any authentic source for the latest year. So, we’ll continue with 2016. this article I know that the data is pretty clear, who spends the least and who spends the most but I wanted to take the idea further using this table. I had been looking for a chance to practice and Visualization in Python and decided this was a great short project. web scraping Although it almost certainly would have been faster to manually enter the data in Excel, then I would not have had the invaluable opportunity to practice a few skills! Data science is about solving problems using a diverse set of tools, and web scraping and regular expressions are two areas I need some work on (not to mention that making plots is always fun). The result was a very short — but complete — project showing how we can bring together these three techniques to solve a data science problem. Requirements Generally, web scraping is divided into two parts: Fetching data by making an HTTP requestExtracting important data by parsing the HTML DOM Libraries & Tools is a Python library for pulling data out of HTML and XML files. Beautiful Soup allow you to send HTTP requests very easily. will help us to scrape dynamic websites without setting up any automation browser. Requests Web Scraper is a Python package providing fast, flexible, and expressive data structures Pandas is a comprehensive library for creating static, animated, and interactive visualizations in Python. matplotlib Setup Our setup is pretty simple. Just create a folder and install Beautiful Soup & requests. For creating a folder and installing libraries type below given commands. I am assuming that you have already installed Python 3.x. pip install beautifulsoup4
pip install requests
pip install matplotlib
pip install pandas mkdir scraper Now, create a file inside that folder by any name you like. I am using scraping.py. Then just import Beautiful Soup & requests in your file like shown below. import pandas as pd from bs4 import BeautifulSoup import matplotlib.pyplot as plt import requests What we are going to scrape Name of the countryExpense per capita Web Scraping Now, since we have all the ingredients to prepare the scraper, we should make a GET request to the to get the raw HTML data. target URL r = requests.get( 'https://api.scrapingdog.com/scrape?api_key=<YOUR_API_KEY>&url=https://data.worldbank.org/indicator/SH.XPD.CHEX.PC.CD?most_recent_value_desc=false&dynamic=true' ).text this will provide you with an HTML code of that target URL. Now, you have to use BeautifulSoup to parse HTML. soup = BeautifulSoup(r,’html.parser’) country= list () expense= list () I have declared two empty lists to store the country names and expenses of each country in 24 hours. As you can see each country is stored in an “item” tag. We’ll store all the item tag within a list. try : Countries=soup.find_all(“div”,{“ class ”: ”item”}) except : Countries= None Since there are 190 countries in the world. We are going to run a for loop for each of that country. Data = {‘country’: country,’expense’: expense} for i in range ( 0 , 190 ): country.append(Countries[i+ 1 ].find_all(“div”,{“ class ”: None })[ 0 ].text.replace(“\n”,””)) expense.append( round ( float (Countries[i+ 1 ].find_all(“div”,{“ class ”: None })[ 2 ].text.replace(“\n”,””).replace(‘,’,’’)))/ 365 ) I have divided the expense by 365 because I want to see how these countries spent money on an everyday basis. Obviously this could have been easier if I have directly divided the given data by 365 but then there is no point in learning right? Now, this is what “ looks like Data” { 'country' : [ 'Central African Republic' , 'Burundi' , 'Mozambique' , 'Congo, Dem. Rep.' , 'Gambia, The' , 'Niger' , 'Madagascar' , 'Ethiopia' , 'Malawi' , 'Mali' , 'Eritrea' , 'Benin' , 'Chad' , 'Bangladesh' , 'Tanzania' , 'Guinea' , 'Uganda' , 'Haiti' , 'Togo' , 'Guinea-Bissau' , 'Pakistan' , 'Burkina Faso' , 'Nepal' , 'Mauritania' , 'Rwanda' , 'Senegal' , 'Papua New Guinea' , 'Lao PDR' , 'Tajikistan' , 'Zambia' , 'Afghanistan' , 'Comoros' , 'Myanmar' , 'India' , 'Cameroon' , 'Syrian Arab Republic' , 'Kenya' , 'Ghana' , "Cote d'Ivoire" , 'Liberia' , 'Djibouti' , 'Congo, Rep.' , 'Yemen, Rep.' , 'Kyrgyz Republic' , 'Cambodia' , 'Nigeria' , 'Timor-Leste' , 'Lesotho' , 'Sierra Leone' , 'Bhutan' , 'Zimbabwe' , 'Angola' , 'Sao Tome and Principe' , 'Solomon Islands' , 'Vanuatu' , 'Indonesia' , 'Vietnam' , 'Philippines' , 'Egypt, Arab Rep.' , 'Uzbekistan' , 'Mongolia' , 'Ukraine' , 'Sudan' , 'Iraq' , 'Sri Lanka' , 'Cabo Verde' , 'Moldova' , 'Morocco' , 'Fiji' , 'Kiribati' , 'Nicaragua' , 'Guyana' , 'Honduras' , 'Tonga' , 'Bolivia' , 'Gabon' , 'Eswatini' , 'Thailand' , 'Jordan' , 'Samoa' , 'Guatemala' , 'St. Vincent and the Grenadines' , 'Tunisia' , 'Algeria' , 'Kazakhstan' , 'Azerbaijan' , 'Albania' , 'Equatorial Guinea' , 'El Salvador' , 'Jamaica' , 'Belize' , 'Georgia' , 'Libya' , 'Peru' , 'Belarus' , 'Paraguay' , 'North Macedonia' , 'Colombia' , 'Suriname' , 'Armenia' , 'Malaysia' , 'Botswana' , 'Micronesia, Fed. Sts.' , 'China' , 'Namibia' , 'Dominican Republic' , 'Iran, Islamic Rep.' , 'Dominica' , 'Turkmenistan' , 'South Africa' , 'Bosnia and Herzegovina' , 'Mexico' , 'Turkey' , 'Russian Federation' , 'Romania' , 'St. Lucia' , 'Serbia' , 'Ecuador' , 'Tuvalu' , 'Grenada' , 'Montenegro' , 'Mauritius' , 'Seychelles' , 'Bulgaria' , 'Antigua and Barbuda' , 'Brunei Darussalam' , 'Oman' , 'Lebanon' , 'Poland' , 'Marshall Islands' , 'Latvia' , 'Croatia' , 'Costa Rica' , 'St. Kitts and Nevis' , 'Hungary' , 'Argentina' , 'Cuba' , 'Lithuania' , 'Nauru' , 'Brazil' , 'Panama' , 'Maldives' , 'Trinidad and Tobago' , 'Kuwait' , 'Bahrain' , 'Saudi Arabia' , 'Barbados' , 'Slovak Republic' , 'Estonia' , 'Chile' , 'Czech Republic' , 'United Arab Emirates' , 'Uruguay' , 'Greece' , 'Venezuela, RB' , 'Cyprus' , 'Palau' , 'Portugal' , 'Qatar' , 'Slovenia' , 'Bahamas, The' , 'Korea, Rep.' , 'Malta' , 'Spain' , 'Singapore' , 'Italy' , 'Israel' , 'Monaco' , 'San Marino' , 'New Zealand' , 'Andorra' , 'United Kingdom' , 'Finland' , 'Belgium' , 'Japan' , 'France' , 'Canada' , 'Austria' , 'Germany' , 'Netherlands' , 'Ireland' , 'Australia' , 'Iceland' , 'Denmark' , 'Sweden' , 'Luxembourg' , 'Norway' , 'Switzerland' , 'United States' , 'World' ], 'expense' : [ 0.043835616438356165 , 0.049315068493150684 , 0.052054794520547946 , 0.057534246575342465 , 0.057534246575342465 , 0.06301369863013699 , 0.06575342465753424 , 0.07671232876712329 , 0.0821917808219178 , 0.0821917808219178 , 0.0821917808219178 , 0.0821917808219178 , 0.08767123287671233 , 0.09315068493150686 , 0.09863013698630137 , 0.10136986301369863 , 0.10410958904109589 , 0.10410958904109589 , 0.10684931506849316 , 0.10684931506849316 , 0.1095890410958904 , 0.11232876712328767 , 0.1232876712328767 , 0.12876712328767123 , 0.13150684931506848 , 0.14520547945205478 , 0.1506849315068493 , 0.1506849315068493 , 0.15342465753424658 , 0.15616438356164383 , 0.15616438356164383 , 0.16164383561643836 , 0.16986301369863013 , 0.1726027397260274 , 0.17534246575342466 , 0.18082191780821918 , 0.18082191780821918 , 0.1863013698630137 , 0.1863013698630137 , 0.1863013698630137 , 0.1917808219178082 , 0.1917808219178082 , 0.19726027397260273 , 0.2 , 0.2136986301369863 , 0.21643835616438356 , 0.2191780821917808 , 0.2356164383561644 , 0.2356164383561644 , 0.2493150684931507 , 0.25753424657534246 , 0.2602739726027397 , 0.2876712328767123 , 0.29041095890410956 , 0.3013698630136986 , 0.30684931506849317 , 0.336986301369863 , 0.35342465753424657 , 0.3589041095890411 , 0.3698630136986301 , 0.3863013698630137 , 0.3863013698630137 , 0.41643835616438357 , 0.4191780821917808 , 0.4191780821917808 , 0.43561643835616437 , 0.4684931506849315 , 0.4684931506849315 , 0.4931506849315068 , 0.5150684931506849 , 0.5150684931506849 , 0.5260273972602739 , 0.547945205479452 , 0.5561643835616439 , 0.5835616438356165 , 0.6027397260273972 , 0.6054794520547945 , 0.6082191780821918 , 0.6136986301369863 , 0.6219178082191781 , 0.6602739726027397 , 0.684931506849315 , 0.7013698630136986 , 0.7123287671232876 , 0.7178082191780822 , 0.7342465753424657 , 0.7452054794520548 , 0.7698630136986301 , 0.8054794520547945 , 0.810958904109589 , 0.8328767123287671 , 0.8438356164383561 , 0.8575342465753425 , 0.8657534246575342 , 0.8712328767123287 , 0.8958904109589041 , 0.8986301369863013 , 0.9315068493150684 , 0.9753424657534246 , 0.9835616438356164 , 0.9917808219178083 , 1.0410958904109588 , 1.0602739726027397 , 1.0904109589041096 , 1.104109589041096 , 1.1342465753424658 , 1.1369863013698631 , 1.1479452054794521 , 1.158904109589041 , 1.1726027397260275 , 1.2164383561643837 , 1.2657534246575342 , 1.284931506849315 , 1.284931506849315 , 1.3041095890410959 , 1.3424657534246576 , 1.3534246575342466 , 1.3835616438356164 , 1.389041095890411 , 1.4136986301369863 , 1.4575342465753425 , 1.515068493150685 , 1.6356164383561644 , 1.6767123287671233 , 1.7068493150684931 , 1.7287671232876711 , 1.7753424657534247 , 1.8136986301369864 , 2.2164383561643834 , 2.3315068493150686 , 2.3945205479452056 , 2.421917808219178 , 2.4356164383561643 , 2.5506849315068494 , 2.5835616438356164 , 2.6164383561643834 , 2.66027397260274 , 2.706849315068493 , 2.7726027397260276 , 2.7835616438356166 , 2.852054794520548 , 2.871232876712329 , 2.915068493150685 , 2.926027397260274 , 3.010958904109589 , 3.1424657534246574 , 3.1890410958904107 , 3.23013698630137 , 3.2465753424657535 , 3.263013698630137 , 3.621917808219178 , 3.6246575342465754 , 3.778082191780822 , 4.13972602739726 , 4.323287671232877 , 4.476712328767123 , 4.586301369863014 , 4.934246575342466 , 5.005479452054795 , 5.024657534246575 , 5.027397260273973 , 5.6 , 6.3780821917808215 , 6.5479452054794525 , 6.745205479452054 , 7.504109589041096 , 7.772602739726027 , 8.054794520547945 , 8.254794520547945 , 10.26027397260274 , 10.506849315068493 , 10.843835616438357 , 11.27945205479452 , 11.367123287671232 , 11.597260273972603 , 11.67945205479452 , 12.213698630136987 , 12.843835616438357 , 12.915068493150685 , 12.991780821917809 , 13.038356164383561 , 13.704109589041096 , 13.873972602739727 , 15.24931506849315 , 15.646575342465754 , 17.18082191780822 , 20.487671232876714 , 26.947945205479453 , 27.041095890410958 , 2.8109589041095893 ]} Dataframe Before even starting to plot a graph we have to prepare a DataFrame using pandas. Now, if you don’t know what is DataFrame, is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). I know you didn't get it right? So, just read this , this will help you a lot. DataFrame article Creating one is very simple and straightforward. df = pd.DataFrame(Data,columns =[‘country’, ‘expense’]) Visualization This project is indicative of data science because the majority of time was spent collecting and formatting the data. However, now that we have a clean dataset, we get to make some plots! We can use both matplotlib and seaborn to visualize the data. If we aren’t too concerned about aesthetics, we can use the built-in dataframe plot method to quickly show results: plt.show() df.plot(kind = ‘bar’, x =’country’, y=’expense’) I know the names of the countries are pretty small. But you can download and analyze it. Now, the main thing which you can see is that many countries are spending way less than a dollar, which is pretty shocking. Majority Countries are from Asia and Africa. In my opinion WHO should focus more on these countries rather than developed countries in the west. This is not necessarily a publication-worthy plot, but it’s a nice way to wrap up a small project. Conclusions The most effective way to learn technical skills is by doing. While this whole project could have been done manually inserting values into Excel, I like to take the long view and think about how the skills learned here will help in the future. The process of learning is more important than the final result, and in this project, we were able to see how to use 3 critical skills for data science: Web Scraping: Retrieving online data using BeautifulSoup: Parsing our data to extract information for Visualization: Showcasing all our hard work Now, get out there and start your own project and remember: it doesn’t have to be world-changing to be worthwhile. Feel free to comment and ask me anything. You can follow me on . Thanks for reading and please hit the like button! Twitter Additional Resources And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey: List of web scraping proxy services Getting Started with Python and Selenium BeautifulSoup Documentation Scrapingdog Documentation Guide to Web Scraping

Target

Visualizing Healthcare Budget using Web Scraping in Python

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Web Scraping with python

100 Stories To Learn About Remote

10 Non-Influencers redefining Influence. Opportunity: $1 Million USD Funding to fight COVID-19

121 Stories To Learn About Coronavirus Impact On Business

278 Stories To Learn About Covid 19

Web Scraping with python

100 Stories To Learn About Remote

10 Non-Influencers redefining Influence. Opportunity: $1 Million USD Funding to fight COVID-19

121 Stories To Learn About Coronavirus Impact On Business

278 Stories To Learn About Covid 19

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps