5 Steps To Write Better Functions
Improve your function game by following these 10 simple guidelines.
introduction
The field of Data Science is an expansive field which features an amalgamation of components from a host of different domains. While this is precisely what makes the field so alluring, it is easy to see how this type of complexity can provide a massive barrier for some people who might be wanting to get into Data Science. In many cases, Data Scientists are also coming from different disciplines than most other Computer Science fields. A Data Scientist is far more likely to come from another domain that is not software. While it is very unlikely to get a development job with very little programming experience, a Data Science position could be a little more open-ended depending on how code-heavy that specific role might be.
Because Data Scientists are often newer to programming and at times come from entirely different fields entirely, one of the biggest shortcomings to the majority of Data Scientist’s capabilities becomes their programming capabilities. This makes sense; these types of skills take even the greatest years to hone! Thankfully, programming on its surface is relatively simple and there are some simple things that may be done to drastically improve the quality of Data Science code!
№1: extraction
The first technique that can be used to better code with very little effort is the extraction technique. This technique is used to separate code into more logically defined categories. This can dramatically improve a function all around for a multitude of reasons. Firstly, the function becomes more readable with less code in it. Secondly, if we need to make any alterations to the nature of one particular function this becomes a lot easier if that logic is all in one place.
A good rule of thumb when it comes to creating great software is to never repeat yourself. With every saying or rule like that there will always be exceptions, but using the extraction technique we can use a lot less code — and a lot less memory to get a lot more done. This is why I have written an entire article on this technique!
The extraction technique presents itself over a series of key steps. The first of these steps is to name the individual sections of your code. This allows us to piece together our function as a sum of all of its parts. After this, we can decide which one of these parts is better served by its own function.
from math import sqrt, mean
def remove_outliers(vec: list):
mu = sum(vec) / len(vec)
var = sum(pow(x-mu,2) for x in vec) / len(vec)
sigma = sqrt(srdmu)
normed = [(mu - i) / sigma for i in vec]
offset = 0
for e, i in enumerate(normed):
if i > 2 or i < -2:
del vec[e - offset]
offset += 1
return(vec)
This function’s naming is good — it describes the objective. However, what happens inside of the function is not necessarily that objective. The only portion of this function where we actually do the objective is in the for loop towards the bottom.
for e, i in enumerate(normed):
if i > 2 or i < -2:
del vec[i]
All of the things that come before this function is getting other things. In some cases, we are calculating things that are very common, such as the mean or the standard deviation.
mu = sum(vec) / len(vec)
var = sum(pow(x-mu,2) for x in vec) / len(vec)
sigma = sqrt(srdmu)
There are multiple logical steps here, and in many cases this might indicate the need for more functions. Though this is a small function in a lone project, these types of things really help a lot more as things grow. The first step to this process is to name all of these steps.
from math import sqrt, mean
def remove_outliers(vec: list):
# getting the mean
mu = sum(vec) / len(vec)
# getting the standard deviation
var = sum(pow(x-mu,2) for x in vec) / len(vec)
sigma = sqrt(srdmu)
# finding and removing outliers
normed = [(mu - i) / sigma for i in vec]
offset = 0
for e, i in enumerate(normed):
if i > 2 or i < -2:
del vec[e - offset]
offset += 1
return(vec)
As we see by the labels, only the portion under the last comment is truly described by the function. Of course, to some degree we are going to need variables. However, we are doing a lot to get these variables — and while this is not the most heinous example, these are common functions which could most certainly be used in other places. That being considered, it really makes a lot of sense to extract them. For the next step, we will go through each section and determine whether or not it belongs in the function.
def remove_outliers(vec: list):
# getting the mean
mu = sum(vec) / len(vec)
In the case of getting the mean, we are going to just make a mean
function.
def mean(vec : list):
return(sum(vec) / len(vec))
For getting the standard deviation, we have two steps.
# getting the standard deviation
var = sum(pow(x-mu,2) for x in vec) / len(vec)
sigma = sqrt(srdmu)
The first step is getting the variance and the second step is getting the square root of the variance. We will simply extract the variance portion, as square rooting this is rather simple.
def variance(vec : list):
mu = mean(vec)
return(sum(pow(x-mu,2) for x in vec) / len(vec))
We could take this a step further by also making a function for the normal distribution.
def norm(vec : list):
mu = mean(vec)
return([(mu - i) / sigma for i in vec])
Now let’s revise our old function!
from math import sqrt, mean
def mean(vec : list):
return(sum(vec) / len(vec))
def variance(vec : list):
mu = mean(vec)
return(sum(pow(x-mu,2) for x in vec) / len(vec))
def norm(vec : list):
mu = mean(vec)
return([(mu - i) / sigma for i in vec])
def remove_outliers(vec: list):
mu = mean(vec)
sigma = sqrt(var(vec))
normed = norm(vec)
offset = 0
for e, i in enumerate(normed):
if i > 2 or i < -2:
del vec[e - offset]
offset += 1
return(vec)
It is important to remember that there is a balance to this. At the same time that we want to make plenty of functions so that we can recycle the same code, we also do not want to build a function for a task. The whole point of making it a function is so the code is there to serve its purpose. Some code’s purpose is in the scope of your function, some code’s purpose is to be in the scope of another function, and some code even has to be global!
№2: right arguments
The next piece of advice I would like to share is the importance of arguments. In order to write a great function, there are three key components we must master: the input, the output, and everything in between those two. For function calls, the input will be given in the form of arguments most of the time. That being said it is important to consider what tools are available for the input of our functions.
In most modern high-level programming languages, arguments come in three main forms. These forms are the following…
- Positional arguments
- Optional positional arguments
- and Key-word arguments
Knowing how and when to use these different types of arguments is going to be incredibly important to writing your functions. Not only this, but it is important to use the right arguments in the right places in order to create the best functions possible. For every output, there is an input. Every function serves a purpose, and this purpose is met in the beginning with input and finalized by the return. Consider the scope of this operation.
Positional arguments are arguments distinguished by their position. This means that to provide that specific argument, you would provide something inside of its position.
def example(a : int, b : int):
pass
In the function example
above, a
and b
are both positional arguments. a
is a positional argument in the first position, whereas b
is in the second position.
>>> def example(a : int, b : int):
... print("a is ", a)
... print("b is ", b)
...
>>> example(1, 2)
a is 1
b is 2
We may make these positional arguments optional by adding a default. Note that optional positional arguments must occur after non-optional positional arguments.
def example(a : int, b = 5):
pass
Key-word arguments are distinguished by the usage of a semicolon for separation, rather than a comma. These will come at the end of your argument list. Each key-word argument requires a name and a default value, as these are not expected to get called each time the function is used.
def example(a : int, b = 5; length = 5):
pass
Here we added the key-word argument length
. This argument can be provided in the same way it is written above.
example(5, length = 5)
We also did not provide b
above, as it is optional. When we ran this function, b
was equal to 5, the default. Note that if we were to have a third argument, call it c
, then c
could not be provided unless b
was provided.
def example(a, b = 5, c = 5; length = 5):
print(a, b, c, length)
Even more important than understanding the nature of these different argument types is understanding their application. The choice as to which argument type is made is important, and needs to be deliberate. Nuances like the third positional argument above are reasons why this is so important. If we want a function to be effective at its job, that has to start at the top of the function with its input.
№3: conditional usage
One way that a lot of programmers could improve their code drastically with a few tweaks to their technique is in conditional usage. In basic programming expressions, we are used to using a recurring phrase, if elseif and else. It is not uncommon to see a function written like this:
def example(x : int):
if x > 1:
return(True)
else:
return(False)
While this is not the crime of the century by any measure, the else here is entirely useless. This type of thing feeds into a broader programming problem; one where there is too much happening in limited scopes. If we are in the scope of a function we want each scope below this — and everything in it — to be deliberate. Establishing a new else scope at the end of a function, for example, is usually pretty useless. Else is not just the throwaway at the end of the conditional, but a deliberate instruction. This function would be better written as
def example(x : int):
if x > 1:
return(True)
return(False)
In most cases, we can make use of returns to make this type of thing a lot less janky. This will also produce faster code, we are not instructing the function into any subroutine but merely continuing the function itself onto the return. This type of thing plays out in a multitude of instances. Getting used to using conditionals in this way will take time. However, making some measured steps towards this type of programming in your technique is certainly worth the time — and will save your senior developers from tearing their eyes out when there are 50 lines happening under an else
statement. This is a rather simple case, but there are going to be many instances like this that are much more complicated, and when software is complicated in this way bad programming practices make it more complicated.
№4: generic functions
Something that can save a lot of time when it comes to programming is making functions as generically as possible. In programming there are going to be operations that have the ability to encompass more than one type. In some cases, there might be some specific difference that needs to be catered to for some instances of type. The best approach in these situations is always to create the most generic function that can possibly be mustered.
This simply means creating functions that can be used in as many contexts as possible. With this, less code becomes more effective and there is no need for quite as many functions. In Python, this advice is a little more limited because we are typically calling the methods as children of a class. However, in a language like Julia, for example, it becomes a way better idea to make generic functions to encompass most types and then specific functions to encompass other types. The function below, round_add
is a simple function I wrote that rounds numbers before they are added. With this particular case, we want a different functionality for round_add
with an Integer
. When this is the case, we want the function to round the summation to the nearest multiple of 5. However, whenever a Float
is provided we will want to round to down to the nearest decimal. We also want this to still round and add all numbers. We start this with a function that is written as generically as possible:
function round_add(x::Number, y::Number)
round(x + y)
end
Then we might be a little more specific depending on what is round/added.
round_step(x, step) = round(x / step) * step
round_add(x::Integer, y::Integer) = round_step(x + y, 5)
Likewise, for floats.
round_add(x::Float64, y::Float64) = round_step(x + y, 5)
Writing the function generically was beneficial here because we did not end up needing to write a function for the other number types in Julia, of which there are many. This is a great example of how making generic functions can be incredibly helpful.
№5: IO is simple
Some simple advice that I always give to newer programmers is just how simple programming is at its core. Programming is about input and output, the hard part is what comes between. Our goal when writing a function is to take some input and translate it into some output. It might be easy to lose track of this simplicity, but keeping a reminder of this in the back of your head can be incredibly helpful to getting the goal done.
I know this point might seem silly, but I assure you this is very significant. The best function is simply the function that gets the output from the input with the least computation and memory. This advice is really simple, but I feel it is also very crucial and I am thankful to be able to share it with you.
Ultimately, everyone’s programming journey is going to be different. Programming is simple, in a way, but ultimately it is an art of solving logical puzzles over and over again in this pretense. I have this input I need this output; what do I do. I have to say, this is also a very rewarding art-form. Programming is incredibly fun because you define your own world of logic. And then you are just building onto that world and making it better, creating little things that you and others may appreciate. You create a function and think it is awesome. It gets hard sometimes — I feel this craft is a weird beast. I am either completely tired of it or completely obsessed with it depending on the week. Sometimes it feels daunting, and sometimes it feels easy.
No matter what happens, the important thing to remember is how much this matters. The ability to weave something from nothing puts power into anyone’s hands. Software is the exception in industry where this is the case. Your art, your creative work, nothing is more important than this and if your passion is software, above all
don’t give up, and stick with it!!!
❤