In , like most modern languages, the is a primary method of abstraction and encapsulation. You’ve probably written hundreds of functions in your time as a developer. But not all functions are created equal. And writing “bad” functions directly affects the readability and maintainability of your code. So what, then, is a “bad” function and, more importantly, what makes a “good” function? Python programming function A Quick Refresher Math is lousy with functions, though we might not remember them, so let’s think back to everyone’s favorite topic: calculus. You may remember seeing formulas like the following . This is a function, called , that takes an argument , and "returns" two times + 3. While it may not look like the functions we're used to in Python, this is directly analogous to the following code: f(x) = 2x + 3 f x x f(x): 2*x + 3 def return Functions have long existed in math, but have far more power in computer science. With this power, though, comes various pitfalls. Let’s now discuss what makes a “good” function and warning signs of functions that may need some refactoring. Keys To A Good Function What differentiates a “good” Python function from a crappy one? You’d be surprised at how many definitions of “good” one can use. For our purposes, I’ll consider a Python function “good” if it can tick off of the items on this checklist (some are not always possible): most Is sensibly named Has a single responsibility Includes a docstring Returns a value Is not longer than 50 lines Is and, if possible, idempotent pure For many of you, this list may seem overly draconian. I promise you, though, if your functions follow these rules, your code will be so beautiful it will make unicorns weep. Below, I’ll devote a section to each of the items, then wrap things up with how they work in harmony to create “good” functions. Naming There’s a favorite saying of mine on the subject, often misatributed to Donald Knuth, but which actually came from : Phil Karlton There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton As silly as it sounds, naming things well is . Here’s an example of a “bad” function name: difficult get_knn_from_df(df): def Now, I’ve seen bad names literally everywhere, but this example comes from Data Science (really, Machine Learning), where its practitioners typically write code in Jupyter notebooks and later try to turn those various cells into a comprehensible program. The first issue with the name of this function is its use of acronyms/abbreviations. The only reason one might abbreviate words is to save typing, but , so you’ll only be typing that full name once. Abbreviations are an issue because they are often domain specific. In the code above, refers to "K-Nearest Neighbors", and refers to "DataFrame", the ubiquitous data structure. If another programmer not familiar with those acronyms is reading the code, almost nothing about the name will be comprehensible to her. Prefer full English words to abbreviations and non-universally known acronyms. every modern editor has autocomplete knn df pandas There are two other minor gripes about this function’s name: the word “get” is extraneous. For most well-named functions, it will be clear that something is being returned from the function, and its name will reflect that. The bit is also unnecessary. Either the function's docstring or (if living on the edge) type annotation will describe the type of the parameter . from_df if it's not already made clear by the parameter's name So how might we rename this function? Simple: def k_nearest_neighbors(dataframe): It is now clear even to the lay person what this function calculates, and the parameter’s name ( ) makes it clear what type of argument should be passed to it. dataframe Single Responsibility Straight from “Uncle” Bob Martin, the applies just as much to functions as it does classes and modules (Mr. Martin’s original targets). It states that (in our case) a function should have a . That is, it should do one thing and one thing. One great reason is that if every function only does one thing, there is only one reason ever to change it: if the way in which it does that thing must change. It also becomes clear when a function can be deleted: if, when making changes elsewhere, it becomes clear the function’s single responsibility is no longer needed, simply remove it. Single Responsibility Principle single responsibility only An example will help. Here’s a function that does more than one “thing”: def calculate_and print_stats(list_of_numbers):sum = sum(list_of_numbers)mean = statistics.mean(list_of_numbers)median = statistics.median(list_of_numbers)mode = statistics.mode(list_of_numbers)print('-----------------Stats-----------------')print('SUM: {}'.format(sum) print('MEAN: {}'.format(mean)print('MEDIAN: {}'.format(median)print('MODE: {}'.format(mode) This function does things: it calculates a set of statistics about a list of numbers prints them to . The function is in violation of the rule that there should be only one reason to change a function. There are two obvious reasons this function would need to change: new or different statistics might need to be calculated or the format of the output might need to be changed. This function is better written as two separate functions: one which performs and returns the results of the calculations and another that takes those results and prints them. two and STDOUT One dead giveaway that a function has multiple responsibilities is the word and in the functions name. This separation also allows for much easier testing of the function’s behavior and also allows the two parts to be separated not just into two functions in the same module, but possibly live in different modules altogether if appropriate. This, too, leads to cleaner testing and easier maintenance. Finding a function that only does things is actually rare. Much more often, you’ll find functions that do many, many more things. Again, for readability and testability purposes, these jack-of-all-trade functions should be broken up into smaller functions that each encapsulate a single unit of work. two Docstrings While everyone seems to be aware of , defining the style guide for Python, far fewer seem to be aware of , which does the same for docstrings. Rather than simply rehash the contents of PEP-257, feel free to read it at your leisure. The main takeaways, however, are: PEP-8 PEP-257 function requires a docstring Every Use proper grammar and punctuation; write in complete sentences Begins with a one-sentence summary of what the function does Uses prescriptive rather than descriptive language This is an easy one to tick off when writing functions. Just get in the habit of always writing docstrings, and try to write them you write the code for the function. If you can’t write a clear docstring describing what the function will do, it’s a good indication you need to think more about why you’re writing the function in the first place. before Return Values Functions can (and ) be thought of as little self-contained programs. They take some input in the form of parameters and return some result. Parameters are, of course, optional. Even if you to create a function that doesn’t return a value, you can’t. If a function would otherwise not return a value, the Python interpreter “forces it” to return . Don't believe me? Test out the following yourself: should Return values, however, are not optional, from a Python internals perspective. try None ❯ python3Python 3.7.0 (default, Jul 23 2018, 20:22:55)[Clang 9.1.0 (clang-902.0.39.2)] on darwinType "help", "copyright", "credits" or "license" more information.>>> def add(a, b):... print(a + b)...>>> b = add(1, 2)3>>> b>>> b is NoneTrue for You’ll see that the value of really is . So, even if you write a function with no statement, it's still going to return . And it return something. After all, it's a little program, right. How useful are programs that produce no output, including whether or not they executed correctly? But most importantly, how would you such a program? b None return something should test I’ll even go so far as to make the following statement: every function should return a useful value, even if only for testability purposes. Code that you write should be tested (that’s not up for debate). Just think of how gnarly testing the function above would be (hint: you'd have to redirect I/O and things go south from there quickly). Also, returning a value allows for method chaining, a concept that allows us to write code like this: add with open('foo.txt', 'r') as input_file:for line in input_file:if line.strip().lower().endswith('cat'):# ... do something useful with these lines The line works because each of the string methods ( ) if line.strip().lower().endswith('cat'): strip(), lower(), endswith() return a string as the result of calling the function. Here are some common reasons people give when asked why a given function they wrote doesn’t return a value: “All it does is [some I/O related thing like saving a value to a database]. I can’t return anything useful.” I disagree. The function can return if the operation completed successfully. True “We modify one of the parameters in place, using it like a reference parameter.””” Two points, here. First, do your best to avoid this practice. For others, providing something as an argument to your function only to find that it has been changed can be surprising in the best case and downright dangerous in the worst. Instead, much like the string methods, prefer returning a new instance of the parameter with the changes applied to it. Even when this isn’t feasible because making a copy of some parameter is prohibitively expensive, you can still fall back to the old “Return if the operation completed successfully" suggestion. True “I need to return multiple values. There is no single value I could return that would make sense.” This is a bit of a straw-man argument, but I heard it. The answer, of course, is to do exactly what the author wanted to do but didn’t know how to do: have use a tuple to return more than one value. And perhaps the most compelling argument for always returning a useful value is that callers are always free to ignore them. In short, returning a value from a function is almost certainly a good idea and very unlikely to break anything, even in existing code bases. Function Length I’ve said a number of times that I’m pretty dumb. I can only hold about 3 things in my head at once. If you make me read a 200 line function and ask what it does, my eyes are likely to glaze over after about 10 seconds. So keep your functions short. 50 lines is a totally arbitrary number that seemed reasonable to me. Most functions you write will (hopefully) be quite a bit shorter. The length of a function directly affects readability and, thus, maintainability. If a function is following the Single Responsibility Principle, it is likely to be quite short. If it is pure or idempotent (discussed below), it is also likely to be short. These ideas all work in concert together to produce good, clean code. So what do you do if a function is too long? is something you probably do all the time, even if the term isn’t familiar to you. It simply means changing a program’s without changing its . So extracting a few lines of code from a long function and turning them into a function of their own is a type of . It’s also happens to be the fastest and most common way to shorten a long function in a productive way. And since you’re giving all those new functions appropriate names, the resulting code much more easily. I could write a whole book on refactoring (in fact it’s been done many times) and won’t go into specifics here. Just know that if you have a function that’s too long, the way to fix it is through refactoring. REFACTOR! Refactoring structure behavior refactoring reads Idempotency and Functional Purity The title of this subsection may sound a bit intimidating, but the concepts are simple. An function always returns the same value given the same set of arguments, regardless of how many times it is called. The result does not depend on non-local variables, the mutability of arguments, or data from any I/O streams. The following function is idempotent: idempotent add_three(number) def add_three(number):"""Return *number* + 3."""return number + 3 No matter how many times one calls , the answer will always be . Here's a different take on the function that idempotent: add_three(7) 10 is not def add_three():"""Return 3 + the number entered by the user."""number = int(input('Enter a number: '))return number + 3 This admittedly contrived example is not idempotent because the return value of the function depends on I/O, namely the number entered by the user. It’s clearly not true that every call to will return the same value. If it is called twice, the user could enter the first time and the second, making the call to return and , respectively. add_three() 3 7 add_three() 6 10 A real-world example of idempotency is hitting the “up” button in front of an elevator. The first time it’s pushed, the elevator is “notified” that you want to go up. Because the pressing the button is idempotent, pressing it over and over again is harmless. The result is always the same. Why is idempotency important? Idempotent functions are easy to test because they are guaranteed to always return the same result when called with the same arguments. Testing is simply a matter of checking that the value returned by various different calls to the function return the expected value. What’s more, these tests will be , an important and often overlooked issue in Unit Testing. And refactoring when dealing with idempotent functions is a breeze. No matter how you change your code the function, the result of calling it with the same arguments will always be the same. Testability and maintainability. fast outside What is a “pure” function? In functional programming, a function is considered if it is both idempotent has no observable . Remember, a function is if it always returns the same value for a given set of arguments. Nothing external to the function can be used to compute that value. However, that doesn’t mean the function can’t things like non-local variables or I/O streams. For example, if the idempotent version of above printed the result before returning it, it is still considered idempotent because while it accessed an I/O stream, that access had no bearing on the value returned from the function. The call to is simply a : some interaction with the rest of the program or the system itself aside from returning a value. pure and side effects idempotent affect add_three(number) print() side effect Let’s take our example one step further. We can write the following snippet of code to determine how many times was called: add_three(number) add_three(number) add_three_calls = 0 def add_three(number):"""Return *number* + 3."""global add_three_callsprint(f'Returning {number + 3}')add_three_calls += 1return number + 3 def num_calls():"""Return the number of times *add_three* was called."""return add_three_calls We’re now printing to the console (a side effect) modifying a non-local variable (another side effect), but since neither of these affect the value returned by the function, it is still idempotent. and A has no side effects. Not only does it not use any “outside data” to compute its value, it has no interaction with the rest of the system/program other than computing and returning said value. Thus while our new definition is still idempotent, it is no longer pure. pure function add_three(number) Pure functions do not have logging statements or calls. They do not make use of database or internet connections. They don't access or modify non-local variables. print() And they don't call any other non-pure functions. In short, they are incapable of what Einstein called “spooky action at a distance” (in a Computer Science setting). They don’t modify the rest of the program or system in any way. In (the kind you’re doing when you write Python code), they are the safest functions of all. They are eminently testable and maintainable and, even more so than mere idempotent functions, testing them is to basically be as fast as executing them. And the test(s) itself is simple: there are no database connections or other external resources to mock, no setup code required, and nothing to clean up afterwards. imperative programming guaranteed To be clear, idempotency and purity are , not required. That is, we’d love to only write pure or idempotent functions because of the benefits mentioned, but that isn’t always possible. The key, though, is that we naturally begin to arrange our code to isolate side effects and external dependencies. This has the effect of making every line of code we write easier to test, even if we’re not always writing pure or idempotent functions. aspirational Summing Up So that’s it. The secret to writing good functions is not a secret at all. It just involves following a number of established best-practices and rules-of-thumb. I hope you found this article helpful. Now go forth and tell your friends! Let’s all agree to just always write great code in all cases :). Or at least do our best not to put more “bad” code into the world. I’d be able to live with that… Posted on Oct 11, 2018 by Jeff Knupp Originally published at jeffknupp.com on October 11, 2018.