Functions are a critical component in any programming project. If done correctly, it’s a practical way to write readable and maintainable code. However, when the functions are not declared correctly, your code becomes hard to read and the long-term maintainability is low — even if we assume that you wrote the code that maintains the same project because any programmer can forget what they did.
Given the importance of functions in programming, Python certainly included, I would like to identify the most common mistakes that some Python programmers can make when declaring functions. By knowing these pitfalls, we can implement the corresponding best practices that will not only improve your code’s readability but also make it more maintainable.
"Readability counts." — The Zen of Python
1. Improper Function Names
Giving names isn’t something that bothers you only when you have a new baby or pet. You may disagree, but as a programmer, it’ll be a challenging task throughout your career. The challenge comes from the standard that we have for function names, which should be unique, informative, and consistent.
Unique
This is a very straightforward requirement. Like with any Python objects, we use names to identify functions. If we declare functions with the same name, either your Python IDE (Integrated Development Environment, such as PyCharm, Visual Studio Code) will complain or the late comer becomes the winner. Consider the following example. We declared two functions named say_hello, and as you can see, when we called the function, the second one that we declared got called:
Informative
Functions are written to perform certain defined operations, and thus their names should reflect their duties. If the names can’t explicitly inform us of these duties, we will struggle to understand other people’s programs or our own code that we wrote last month. Being informative means being specific and accurate to the intended purposes of the functions. Consider the following examples:
Consistent
Python programming encourages modality, which implies that we want to group related classes and functions in certain modules. Within modules and between modules, you want to name your functions consistently. In terms of consistency, we mean that we use the same conventions for particular kinds of objects and operations. Consider the following trivial examples. The first three functions all perform similar operations with two numbers, and thus I use the same format: verb + underscore + numbers. In the custom class Mask, the two functions promotion_price and sales_price have similar name structure, with the first part defining the kind of the price and the second part indicating the nature of the returned value (i.e. a price expressed as a floating-point number).
2. Mixed Duties and Excessive Length
Another common mistake is that a particular function has too many mixed duties — a mistake that even some senior programmers can make sometimes if they don’t refactor their programs continuously. In terms of duties for a given function, the best practice is that the function has only one well-defined duty that can be easily reflected by its sensible name.
Another common accompanying symptom of functions with mixed duties is they tend to be excessively long, and thus it’s harder to understand the functions and debug should any bugs arise. Let’s consider the following hypothetical example. We use the popular pandas library for processing our physiological data collected in our experiment. For each subject, we have four sessions of data in the CSV format. We can write a function called process_physio_data that includes all three data processing steps. However, because of the complexity of the data, the function will be over 100 lines of code.
Instead of writing this long function, we could write the following functions that can better show the steps involved in processing these data. As shown in the code snippet below, we create three helper functions that are respectively responsible for the three steps. Notably, each of these functions has exactly one duty. The updated process_physio_data function is thinner and clearer, and its only duty is to provide a pipeline to process the physiological data. With this refactoring of these jobs, the overall readability of the code is much improved.
3. No Documentation
This is a common mistake from which programmers have to learn their lesson in the long term. At the surface, it seems perfectly fine that your code still runs as it’s supposed to — even if you don’t have any documentation. For example, when you’re working on a single project continuously throughout a few weeks, you know exactly what you’re doing with each function. However, when you have a need to revisit your code to update some features, how much time will you have to spend figuring out what you did? I learned the lesson the hard way and you have probably had similar experiences.
A lack of documentation can be a bigger problem in a teamwork environment where people share APIs or when you’re making open-source libraries. When we use others’ functions, especially complicated ones, we don’t know the specific operations within the functions. However, we simply need to read the documentation to know how to call the function and what the expected return value is. Can you imagine if none of the libraries or frameworks that your work relies on had any documentation?
I’m not saying that you should write extensive docstrings for your functions. I believe that if you have followed the naming standards to keep your function names unique, informative, and consistent and each of your functions performs just one duty and has proper length, you don’t need to write too much documentation for your functions. However, if you work as part of a big team either in a corporate or open-source community, you have to implement standard documentation conventions for your own benefit and for others too.
I’m not going to expand on this topic, but here’s a Medium article about writing good Python docstrings.
4. Incorrect Use of Default Values
When we write functions, Python allows us to set some default values to certain arguments. Many built-in functions use this feature too. Consider the example below. We can create a list object using the range() function, which has the general syntax range(start, stop, step). When it’s omitted, the default step argument will use one. However, we can explicitly set the step argument (say, 2) in the code below:
However, things can become tricky when we write functions involving default values with mutable data types. When we say mutable data, we mean that the Python objects can be changed after their creation, such as lists, dictionaries, and sets. To learn more about Python data mutability, you can refer to my previous article. Consider the following trivial example regarding the use of default values with mutable arguments in functions:
When we’re trying to append the score 98, the expected outcome is printed because we omit the scores argument and we’re expecting the empty list to be used. When we’re trying to append the score 92 to a list of [100, 95], the outcome [100, 95, 92] is also as expected. However, when we’re trying to append the score 94, some of us may expect the outcome to be [94], but it’s not the case. Why can that happen?
It’s all because functions in Python are also first-class citizens and considered regular objects (see my previous article about functions being objects in Python). The implication is that when a function is defined, an object is created, including the function’s default variables. Let’s see a code snippet about these concepts:
We modify the previous function, allowing it to output the memory address for the scores list. As you can see, before we call the function, we’re able to find out the default values of the function’s argument and its memory address accessing the __default__ attribute. After calling the function twice, the same list object with the same memory address has been updated.
What’s the best practice then? We should use None as the default value for the mutable data type, such that the function doesn’t instantiate the mutable object when the function is declared. When the function is called, we can create the mutable object as applicable. See the code below for additional information. Now, everything works as expected:
5. Abuse of *args & **kwargs
Python allows us to write flexible functions by supporting variable numbers of arguments. If you recall, you must have seen *args and **kwargs somewhere in the documentation of certain libraries. In essence, *args refers to an undetermined number of positional arguments, while **kwargs refers to an undetermined number of keyword arguments.
In Python, positional arguments are arguments that are passed based on their positions, while keyword arguments are arguments that are passed based on their specified keywords. Let’s see a trivial example below:
In the function add_numbers, num0 and num1 are positional arguments, while num2 and num3 are keyword arguments. One thing to note is that you can change the order between keyword arguments but not between positional and keyword arguments. Let’s take it a step further by looking at how *args and **kwargs work. It’s best to learn them with an example. Two things to note:
The variable number of positional arguments is handled as a tuple, and thus we can unpack it with one asterisk (learn more about tuple unpacking in this article).
The variable number of keyword arguments is handled as a dictionary, and thus we can unpack it with two asterisks (learn more about dictionary unpacking in this article).
Although the availability of *args and **kwargs allows us to write more flexible Python functions, the abuse of them can lead to some confusion with your function. Earlier, I mentioned that we can use the pandas library for data manipulation and briefly mentioned the read_csv function, which reads a CSV file. Do you know how many arguments this function can take? Let’s see its official documentation:
If you count them, you’ll find out that the total number of arguments is 49 — one positional and 48 keyword arguments. Theoretically, we can make the list shorter by doing this:
However, in the actual implementation of this function, we’ll still have to unpack the **kwargs and figure out how to read the CSV file correctly. Why were these seasoned Python developers willing to list all these keyword arguments? It’s all because they understand the following principle:
“Explicit is better than implicit.” — The Zen of Python
Although using **kwargs could save us some writing for the first line of our function declaration, the cost is that our code becomes less explicit. The same idea also applies to the use of *args. As mentioned above, if we work in a code-sharing environment, we always want our code to be explicit and thus easier to understand. Therefore, whenever possible, we want to avoid using *args and **kwargs to write more explicit code.
Takeaways
In this article, we reviewed five common mistakes that Python programmers can make in their code. Although you can have your own style of coding by overlooking these mistakes in your projects, your code can become hard to understand and result in low long-term maintainability. Therefore, if possible, we all may want to avoid these mistakes and facilitate code readability and thereby shareability.
Additional Reading
In this article, we mentioned several concepts that have been covered by my previous articles, such as functions being objects in Python and unpacking tuples and dictionaries. If you’re interested, please feel free to read some of them. For your convenience, the links are provided below: