In the last blog, I hope I have sold you the idea that Pandas is an amazing library for quick and easy data analysis and it’s much easier to use than you thought. If you have not read my first blog about Pandas, please go through it before you move forward.
In the last blog, we saw basic Dataframe operations using sample sales data. Let’s assume you are a manager leading a sales team, and you were all happy about the sales trajectory and the pivot representation of the data you learned to create from our last blog.
import numpy as np
df.pivot_table(index=["Country"],
columns=["Region"],
values=["Quantity"],
aggfunc=[np.sum])
That’s when you realize you have missed sales data of a particular quarter because it was lost in one of the spreadsheets. Now, what do you do? You already have a report ready to go. How can you incorporate the new data into the current pivot representation without major changes?
If you see, the pivot table is constructed with a single Dataframe df, somehow if we can find a way to feed our new data into the df then we can just re-run the pivot code and voila!! we will get the report again.
So here are the steps we are going to follow,
df2 = pd.read_csv("data/Pandas - Q4 Sales.csv")
df2.head()
Using concat
Pandas Concat method concatenates the contents of multiple Dataframes and creates a new Dataframe.
The axis param of the method enables you to concatenate data along rows or columns
result_df = pd.concat([df, df2], axis=0, sort=False)
# axis = 0, append along rows,
# axis = 1, append along cols
result_df.tail() # tail is similar to head returns last 10 entries
Using append
Unlike concat , the append method adds up data to an existing dataframe instead of creating a new Dataframe. Also, you can notice that we don’t supply any axis parameter here since append method only allows adding new entries as rows.
result_df = df.append([df2],sort=False)
result_df.tail()
If you take a closer look, in both cases, the data frames that need to be combined are supplied as a python list [df1, df2]. This implies that we can combine as many Dataframes as we want
pivot = result_df.pivot_table(index=["Country"],
columns=["Region"],
values="Quantity")
You have a couple of hours for your final meeting. Your presentation is concrete, your sales are good but still, something is missing. Charts. For a management person who was so used to spreadsheets charts, leaving them behind is not a good idea. But, we have a short time to go back to spreadsheets, don’t we? Worry not, Pandas comes with a built-in charting framework which lets you draw graphs of our pivot representation
As a person who was known for your perfection something doesn’t sit well in you. One of the tabular representations that you have created has unnecessary information that doesn’t interest your management, and a couple of columns have names that are used internally in your company and will not ring any bell to the management.
Worry not, we can do it all in one shot and pretty quick. In pandas terms, we call this method chaining.
Method chaining enables you to perform a various transformation on the same data without storing the intermediate result.
Explicit is better than implicit hence let’s rename “Total” to “Total Sales”
We don’t need the date of purchase just the year and quarter
We don’t need the requester of purchase, Salesperson, and Date of purchase. So let’s drop it.
result_df.rename({"Total": "Total Sales"}, axis=1).assign(Quarter=result_df['Date of Purchase'].dt.quarter, Year=result_df['Date of Purchase'].dt.year) .drop(["Requester", "Sales Person", "Date of Purchase"], axis=1).head()
With that, our final report looks good and guess what? Your management is not only happy about your sales this year but also excited about your new found love for Pandas, but there is just one last thing remaining, you need to send the final data as a CSV back to your management. But worry not we have pandas to do it for you.
result_df.to_csv(path_or_buf="Export_Data.csv")
An “Export_Data.csv” file would be created in your current path which you can happily send to your management as an email attachment.
As you rest back on your seat, you want to automate the pandas experiment that you just did for the future sales reports. Thankfully, you have an intern who is joining you in a couple of days. It will be a great project for him to pick it up. Something in me tells that things aren’t going to be as easy as it was for you. which we will see in the next blog “What’s wrong with Pandas?”
Did the blog nudge you to deep dive into pandas?
Hold the “claps” icon and give a shout on twitter.
Follow to stay tuned on future blogs.
Preview image with title