“For professions that have long relied on trawling through spreadsheets, Python is especially valuable. Citigroup, an American bank, has introduced a crash course in Python for its trainee analysts.” — The Economist
Finance professionals have long had access to VBA (Visual Basic for Applications) in Excel to build custom functionality and automate workflows. With the emergence in recent years of Google Sheets as a serious contender in the spreadsheet space, Google Apps Script now offers an additional choice.
However, I would like to draw attention to a third option, the Python programming language, which has become tremendously popular in a number of fields.
In this article, I will provide some examples of what you can accomplish with Python, starting with an overview of the language itself and why it has become so popular in such a wide variety of fields, ranging across web development, machine learning, finance, science, and education, just to name a few. The second half will then consist of a step-by-step tutorial.
The aim of me writing this is to help you decide whether Python looks intriguing enough for you to consider adding it to your financial toolbox. If you take the leap, there are many apps, courses, videos, articles, books and blog posts available to learn the language. At the end of the piece, I have listed some resources that have helped me along the way.
My introduction to programming was learning BASIC on an Oric 1 in the mid-1980s. Back then BASIC was the most common beginner’s language. Other languages that I dabbled with in the late 80s until mid-90s were Pascal and C, but I never used them in any professional capacity, and I didn’t expect to need or use programming skills. To my knowledge at the time in the late 90s, finance and programming were very different fields, when I chose to embark on a career path in finance.
Fast forward to 2012, and I was looking to pick programming back up as a hobby, so I started researching the languages available at the time. It turned out quite a bit had happened, and when I came across Python I was hooked, for many of the reasons that I will outline in the next section. Since then I have used Python for a wide range of tasks, from small scripts to larger projects, both personally and professionally. Many, but not all, have involved spreadsheets, the workbench of many a finance professional.
Here are a few examples of how well spreadsheets and Python can go together:
I work with all aspects of M&A transactions, not just the execution, but also integration. In a recent case, the PMO team decided on a hybrid program and project management approach, using waterfall planning and Gantt charts for high-level plans for each of the twelve integration workstreams, in addition to a Kanban board for tracking the hundreds of activities going on at any given time, in the first 100-day plan and beyond. The Kanban tool that was chosen, MeisterTask, has a number of statistical and reporting features, but our needs went beyond that in terms of analysis and presentation, which required a custom solution. This is the workflow that I automated using Python:
Developing the script required an upfront investment of a few hours, but now, updating the reporting pack for steering committee meetings or ad hoc analysis takes a matter of minutes. Literally, about 30 seconds to go to the right folder and run the script with a one-line command, and then a few minutes to copy-paste the output into the slide deck. With about 500 activities (cards) across twelve workstreams already about a month into execution, weekly tracking of how they move, inside a program timeline of two years, you quickly find yourself dealing with thousands, and eventually tens of thousands of data points across dozens of files. Without automation, we are talking about some very tedious tasks here.
The “time value of money” trade-off between just getting on with things, or adding more initial workload by setting up automation is a common theme in finance. I made a similar decision with the first step of this process, by exporting the data as CSV files. MeisterTask, like many modern web applications, has an API, which can be connected to your Python application, but the time spent setting it up would far outweigh the time savings for our use case here.
So, as you see, oftentimes the optimal solution is to automate certain steps of a workflow and keep others manual.
Another example is something I did out of personal interest but I want to highlight it because it contains some other interesting elements of Python’s utility:
The results here could be combined with your own personal weightings in terms of preferences, and financial limitations when looking for real estate.
These are only two examples, focused on automating spreadsheet-related work and adding features, but the opportunities with Python are almost endless. In the next section, I will outline the reasons why it has become so popular, before moving on to a step-by-step Monte Carlo simulation tutorial in Python.
The programming language Python has been around since 1990, but it is not until recent years that its popularity has exploded.
There are several reasons for this, let’s look at each in turn.
A high-level programming language is one that abstracts away many of the details of the inner workings of the computer. A good example is memory management. Lower-level programming languages require a detailed understanding of the complexities of how the computer’s memory is laid out, allocated and released, in addition to the time spent and lines of code required to handle tasks. Python abstracts away and handles many of these details automatically, leaving you to focus on what you want to accomplish.
Because Python is a high-level programming language, the code is more concise and almost entirely focused on the business logic of what you want to achieve, rather than technical implementation details. Language design choices contribute to this: as an example, Python doesn’t require the use of curly braces or semicolons to delineate functions, loops, and lines the way many other languages do, which makes it more concise and, as some argue, improves readability.
One observation that has influenced language design choices in Python is that programs are read more often than they are written. Python excels here as its code looks very close to plain English, especially if you name the different components of your script or program in a sensible manner.
“Enlightened trial and error outperforms the planning of flawless intellects.” — David Kelley
Python is ideal for prototyping and rapid, iterative development (and, yes, trial-and-error) because interactive interpreter tools such as the Python shell, IPython, and Jupyter notebooks are front and center in the Python toolchain. In these interactive environments, you can write and execute each line of code in isolation and see the results (or a helpful error message) immediately. Other languages have this too, but in most cases not to the same degree as Python.
In addition to being great for prototyping, Python is also an excellent and powerful language for large production applications. Some of the largest software companies in the world make heavy use of Python in a variety of applications and use cases.
Everything needed for basic operations is built right into the language, but in addition to that, the Python standard library has tools for working with files, media, networking, date and time information, and much more. This allows you to accomplish a wide variety of tasks without having to look for third-party packages.
For finance professionals, Pandas with its DataFrame and Series objects, and Numpy with its ndarray are the workhorses of financial analysis with Python. Combined with matplotlib and other visualization libraries, you have great tools at your disposal to assist productivity.
Python is developed under an open source license making it free also for commercial use.
What follows is a step-by-step tutorial showing how to create a simplified version of the Monte Carlo simulation described in my previous blog post, but using Python instead of the @RISK plugin for Excel.
Monte Carlo methods rely on random sampling to obtain numerical results. One such application is to draw random samples from a probability distribution representing uncertain potential future states of the world where variables or assumptions can take on a range of values.
It is helpful to do the Monte Carlo simulation on a simplified DCF valuation model instead of the more common examples you see showing valuation of options or other derivatives, since for this we don’t need any math beyond the basics of calculating the financial statements and discounting cash flows, allowing us to focus on the Python concepts and tools. Please note though that this basic tutorial model is meant to illustrate the key concepts, and is not useful as-is for any practical purposes. I also won’t touch on any of the more academic aspects of Monte Carlo simulations.
The tutorial assumes that you are familiar with the basic building blocks of programming, such as variables and functions. If not, it might be helpful to take 10 minutes to check the key concepts in for example this introduction.
I start with the same very simplified DCF valuation model used in the Monte Carlo simulation tutorial. It has some key line items from the three financial statements, and three highlighted input cells, which in the Excel version have point estimates that we now want to replace with probability distributions to start exploring potential ranges of outcomes.
“Make it work, make it right, make it fast” — Kent Beck
The intention of this tutorial is to give finance professionals new to Python an introduction not only to what a useful program might look like, but an introduction also to the iterative process you can use to develop it. It, therefore, has two parts:
The Jupyter notebook is a great tool for working with Python interactively. It is an interactive Python interpreter with cells that can contain code, Markdown text, images, or other data. For this tutorial I used the Python Quant Platform, but I can also recommend Colaboratory by Google, which is free and runs in the cloud. Once there, simply select “New Python 3 Notebook” in the “File” menu, and you are ready to go.
Having done that, the next step is to import the third-party packages we need for data manipulation and visualizations, and tell the program that we want to see charts inline in our notebook, instead of in separate windows:
A note before we start naming our first variables. As I already highlighted, readability is one of Python’s strengths. Language design goes a long way to support that, but everyone writing code is responsible for making it readable and understandable, not only for others but also for themselves. As Eagleson’s Law states, “Any code of your own that you haven’t looked at for six or more months might as well have been written by someone else.”
A good rule of thumb is to name the components of your program in such a way that you minimize the need for separate comments that explain what your program does.
With that in mind, let’s move on.
There are many ways that we can work with existing spreadsheet data in Python. We could, for example, read a sheet into a Pandas DataFrame with one line of code using the
read_excel command. If you want a tighter integration and real-time link between your spreadsheet and Python code, there are both free and commercial options available to provide that functionality.
Since the model here is very simple, and to focus us on the Python concepts, we will be recreating it from scratch in our script. At the end of the first part, I will show how you can export what we have created to a spreadsheet.
As a first step towards creating a Python representation of the financial statements, we will need a suitable data structure. There are many to choose from, some built into Python, others from various libraries, or we can create our own. For now, let’s use a Series from the Pandas library to have a look at its functionality. This input and its corresponding output is shown below:
With the first three lines we have created a data structure with an index consisting of years (each marked to show if it is Actual, Budget or Projected), a starting value (in millions of euros, as in the original DCF model), and empty (NaN, “Not a Number”) cells for the projections. The fourth line prints a representation of the data — in general, typing the name of a variable or other objects in the interactive interpreter will usually give you a sensible representation of it.
Next, we declare a variable to represent the projected annual sales growth. At this stage, it is a point estimate, the same figure as in our original DCF model. We want to first use those same inputs and confirm that our Python version performs the same and gives the same result as the Excel version, before looking at replacing point estimates with probability distributions. Using this variable, we create a loop that calculates the sales in each year of the projections based on the previous year and the growth rate. We now have projected sales, instead of NaN:
Using the same approach, we continue through the financial statements, declaring variables as we need them and performing the necessary calculations to eventually arrive at free cash flow. Once we get there we can check that what we have corresponds to what the Excel version of the DCF model says.
This gives us the free cash flows:
The one line above that perhaps needs a comment at this stage is the second
tax_payment reference. Here, we apply a small function to ensure that in scenarios where profit before tax becomes negative, we won’t then have a positive tax payment. This shows how effectively you can apply custom functions to all cells in a Pandas Series or DataFrame. The actual function applied is, of course, a simplification. A more realistic model for a larger valuation exercise would have a separate tax model that calculates actual cash taxes paid based on a number of company-specific factors.
Having arrived at projected cash flows, we can now calculate a simple terminal value and discount all cash flows back to the present to get the DCF result. The following code introduces indexing and slicing, which allows us to access one or more elements in a data structure, such as the Pandas Series object.
We access elements by writing square brackets directly after the name of the structure. Simple indexing accesses elements by their position, starting with zero, meaning that
free_cash_flow would give us the second element.
[-1] is shorthand for accessing the last element (the last year’s cash flow is used to calculate the terminal value), and using a colon gives us a slice, meaning that
[1:] gives us all elements except the first one, since we don’t want to include the historical year
2018A in our DCF valuation.
That concludes the first part of our prototype — we now have a working DCF model, albeit a very rudimentary one, in Python.
Before moving on to the actual Monte Carlo simulation, this might be a good time to mention the exporting capabilities available in the Pandas package. If you have a Pandas DataFrame object, you can write that to an Excel file with one line using the
to_excel method. There is similar functionality to export to more than a dozen other formats and destinations as well.
Now we are ready to tackle the next challenge: to replace some of the point estimate inputs with probability distributions. While the steps up to this point may have seemed somewhat cumbersome compared to building the same model in Excel, these next few lines will give you a glimpse of how powerful Python can be.
Our first step is to decide how many iterations we want to run in the simulation. Using 1,000 as a starting point strikes a balance between getting enough data points to get sensible output plots, versus having the simulation finish within a sensible time frame. Next, we generate the actual distributions. For the sake of simplicity, I generated three normal distributions here, but the NumPy library has a large number of distributions to choose from, and there are other places to look as well, including the Python standard library. After deciding which distribution to use, we need to specify the parameters required to describe their shape, such as mean and standard deviation, and the number of desired outcomes.
Here you could argue that EBITDA should not be a separate random variable independent from sales but instead correlated with sales to some degree. I would agree with this, and add that it should be driven by a solid understanding of the dynamics of the cost structure (variable, semi-variable and fixed costs) and the key cost drivers (some of which may have their own probability distributions, such as for example input commodities prices), but I leave those complexities aside here for the sake of space and clarity.
The less data you have to inform your choice of distribution and parameters, the more you will have to rely on the outcome of your various due diligence workstreams, combined with experience, to form a consensus view on ranges of likely scenarios. In this example, with cash flow projections, there will be a large subjective component, which means that visualizing the probability distributions becomes important. Here, we can get a basic visualization, showing the sales growth distribution, with only two short lines of code. This way we can quickly view any distribution to eyeball one that best reflects the team’s collective view.
Now we have all the building blocks we need to run the simulation, but they are not in a convenient format for running the simulation. Here is the same code we have worked with thus far but all gathered in one cell and rearranged into a function for convenience:
We can now run the whole simulation and plot the output distribution, which will be the discounted cash flow value of this company in each of the 1,000 iterations, with the following code. The
%time command is not Python code but a notebook shorthand that measures the time to run something (you could instead use the Python function from the standard library). It depends on the computer you run it on, but this version needs 1-2 seconds to run the 1,000 iterations and visualize the outcome.
“The lurking suspicion that something could be simplified is the world’s richest source of rewarding challenges.” — Edsger Dijkstra
Refactoring refers to the process of rewriting existing code to improve its structure without changing its functionality, and it can be one of the most fun and rewarding elements of coding. There can be several reasons to do this. It might be to:
To show what one step in that process might look like, I cleaned up the prototype that we just walked through by collecting all initial variables in one place, rather than scattered throughout as in the prototype script, and optimized its execution speed through a process called vectorization.
“Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectorization.” Wes McKinney
It now looks cleaner and easier to understand:
The main difference you will notice between this version and the previous one is the absence of the
for i in range(iterations) loop. Using NumPy’s array operation, this version runs in 18 milliseconds compared to the 1.35 seconds for the prototype version - roughly 75x faster.
I’m sure that further optimization is possible, since I put together both the prototype and refined version in a short time solely for the purpose of this tutorial.
This tutorial showed some of the powerful features of Python, and if you were to develop this further the opportunities are almost endless. You could for example:
I haven’t even touched upon what you could also do with the various web, data science, and machine learning applications that have contributed to Python’s success.
This article gave an introduction to the Python programming language, listed some of the reasons why it has become so popular in finance and showed how to build a small Python script. In a step-by-step tutorial, I walked through how Python can be used for iterative prototyping, interactive financial analysis, and for application code for valuation models, algorithmic trading programs and more.
For me, at the end of the day, the killer feature of Python technology is that it is simply fun to work with! If you enjoy problem-solving, building things and making workflows more efficient, then I encourage you to try it out. I would love to hear what you have done with it or would like to do with it.
As an investment banker, startup CFO, fundraising consultant, and corporate M&A director, Stefan has built financial models and conducted due diligence and analysis for projects ranging from a €6 million startup fundraising to a €7 billion LBO. Occasional freelancing allows him to tackle interesting finance challenges in areas outside the scope of his full-time job as M&A director. [click to continue…]
Originally published at www.toptal.com.
Create your free account to unlock your custom reading experience.