paint-brush
Learn Pandas Via Usecases — Part 2by@bhavaniravi
1,666 reads
1,666 reads

Learn Pandas Via Usecases — Part 2

by Bhavani RaviFebruary 7th, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In the last blog, I hope I have sold you the idea that Pandas is an amazing library for quick and easy data analysis and it’s much easier to use than you thought.<a href="https://hackernoon.com/python-pandas-tutorial-92018da85a33"> If you have not read my first blog about Pandas, please go through it before you move&nbsp;forward.</a>

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Learn Pandas Via Usecases — Part 2
Bhavani Ravi HackerNoon profile picture

Learn Python Pandas Via Usecases — Part 2

Use cases open up more functionalities

In the last blog, I hope I have sold you the idea that Pandas is an amazing library for quick and easy data analysis and it’s much easier to use than you thought. If you have not read my first blog about Pandas, please go through it before you move forward.

Oops !! We missed Some Data

In the last blog, we saw basic Dataframe operations using sample sales data. Let’s assume you are a manager leading a sales team, and you were all happy about the sales trajectory and the pivot representation of the data you learned to create from our last blog.

import numpy as np
df.pivot_table(index=["Country"], 
               columns=["Region"], 
               values=["Quantity"], 
               aggfunc=[np.sum])

That’s when you realize you have missed sales data of a particular quarter because it was lost in one of the spreadsheets. Now, what do you do? You already have a report ready to go. How can you incorporate the new data into the current pivot representation without major changes?

If you see, the pivot table is constructed with a single Dataframe df, somehow if we can find a way to feed our new data into the df then we can just re-run the pivot code and voila!! we will get the report again.

So here are the steps we are going to follow,

1. Load the new spreadsheet data into a new Dataframe

df2 = pd.read_csv("data/Pandas - Q4 Sales.csv")
df2.head()

2. Combine two Dataframe into a single df object,

Using concat

Pandas Concat method concatenates the contents of multiple Dataframes and creates a new Dataframe.

The axis param of the method enables you to concatenate data along rows or columns

result_df = pd.concat([df, df2], axis=0, sort=False)
# axis = 0, append along rows, 
# axis = 1, append along cols

result_df.tail() # tail is similar to head returns last 10 entries

Using append

Unlike concat , the append method adds up data to an existing dataframe instead of creating a new Dataframe. Also, you can notice that we don’t supply any axis parameter here since append method only allows adding new entries as rows.

result_df = df.append([df2],sort=False)
result_df.tail()

If you take a closer look, in both cases, the data frames that need to be combined are supplied as a python list [df1, df2]. This implies that we can combine as many Dataframes as we want

3. Re-run the pivot code

pivot = result_df.pivot_table(index=["Country"], 
                              columns=["Region"], 
                              values="Quantity")

Charts are better than tables

You have a couple of hours for your final meeting. Your presentation is concrete, your sales are good but still, something is missing. Charts. For a management person who was so used to spreadsheets charts, leaving them behind is not a good idea. But, we have a short time to go back to spreadsheets, don’t we? Worry not, Pandas comes with a built-in charting framework which lets you draw graphs of our pivot representation

Perfection

As a person who was known for your perfection something doesn’t sit well in you. One of the tabular representations that you have created has unnecessary information that doesn’t interest your management, and a couple of columns have names that are used internally in your company and will not ring any bell to the management.

Worry not, we can do it all in one shot and pretty quick. In pandas terms, we call this method chaining.

Method chaining enables you to perform a various transformation on the same data without storing the intermediate result.

  1. Explicit is better than implicit hence let’s rename “Total” to “Total Sales”

  2. We don’t need the date of purchase just the year and quarter

  3. We don’t need the requester of purchase, Salesperson, and Date of purchase. So let’s drop it.




    result_df.rename({"Total": "Total Sales"}, axis=1).assign(Quarter=result_df['Date of Purchase'].dt.quarter, Year=result_df['Date of Purchase'].dt.year) .drop(["Requester", "Sales Person", "Date of Purchase"], axis=1).head()

One Last Thing

With that, our final report looks good and guess what? Your management is not only happy about your sales this year but also excited about your new found love for Pandas, but there is just one last thing remaining, you need to send the final data as a CSV back to your management. But worry not we have pandas to do it for you.

result_df.to_csv(path_or_buf="Export_Data.csv")

An “Export_Data.csv” file would be created in your current path which you can happily send to your management as an email attachment.

As you rest back on your seat, you want to automate the pandas experiment that you just did for the future sales reports. Thankfully, you have an intern who is joining you in a couple of days. It will be a great project for him to pick it up. Something in me tells that things aren’t going to be as easy as it was for you. which we will see in the next blog “What’s wrong with Pandas?”

Image result for medium clap gif

Did the blog nudge you to deep dive into pandas?

Hold the “claps” icon and give a shout on twitter.

Follow to stay tuned on future blogs.

Preview image with title