The article provides seven practical coding tips for Python, emphasizing clean and efficient code practices.
Abstract
The author, a data scientist, shares seven essential tips for writing clean Python code, which include using f-strings for string formatting, platform-independent directory delimiters with pathlib, variable unpacking, the .get method for safe dictionary access, looping with the zip function, leveraging list comprehensions, and multiple assignments with * and **. These tips are aimed at enhancing code readability, reusability, and efficiency for daily tasks in data science and beyond. The article also touches on the importance of handling exceptions and the benefits of combining different Python features to create powerful code constructs.
Opinions
The author expresses enthusiasm for Python's f-strings, considering them a significant improvement in string formatting.
There is a clear preference for using pathlib for path operations to ensure code portability across different operating systems.
The .get method is recommended for dictionary access to avoid KeyError exceptions, but with a caution to use it judiciously and not overlook the importance of explicit exceptions.
The zip function is highly praised for its ability to iterate over multiple iterables simultaneously, simplifying code that would otherwise require nested loops.
List comprehensions are seen as a valuable tool for writing more concise and efficient code, reducing the need for verbose loops.
The * and ** operators are considered powerful for unpacking iterables and dictionaries, respectively, making it easier to handle variable numbers of arguments in functions.
The author advocates for combining Python's advanced features, such as zip, *, **, and list comprehensions, to write even more sophisticated and clean code.
Seven Tips To Clean Code With Python
Here are the seven tips and code bites that I use every day in my work as a data scientist.
Image by author
In this story, I will share what I use in my day-to-day work and what has helped me improve my code. Check the list below to see if there’s anything new for you!
String formatting with f-strings
Platform independent directory delimiters
Variable unpacking and the _ operator
.get instead of [key] for dictionary iterations
Loop two iterators with the zip function
The power of list comprehensions
Multiple assignment with * and **
String formatting with f-strings
Hallelujah! That is what I thought when I learned about the Python 3.6+ update that includes a new way of formatting strings: the Python formatted string literal. String formatting in Python has come a long way.
F-strings consider everything within { curly brackets } as an expression, and with these expressions, we can do simple arithmetic but also functions and method calls!
Platform-independent directory delimiters
Making your Python code as re-usable as possible should be one of your main concerns. But what if you’re working on a Unix platform and your colleague is working on Windows?
The path delimiter on Windows is \, but on my Linux or Mac system, it is /. Avoid dealing with these nuances by using the built-in pathlib library since Python 3.4:
pathlib converts your path strings to objects, making your code more explicit and with PEP519 it is becoming the universally recognized abstraction for file paths. It adds several useful methods.
[x for x in directory.iterdir() if x.is_dir()]
>>> Lists subdirectories
withPath(file).open() as f: f.readline()
>>> opens a file
Variable unpacking
Unpacking variables are probably most used for functions that return multiple variables, such as in the example below.
deftwo_strings():
return'first', 'second'
x, y = two_strings()
But it is also useful for data types that contain multiple items. The only important notion here is that, if not otherwise defined, variable unpacking results in tuples.
The _ operator is an unnamed variable, essentially a variable that you’re not interested in and won’t be doing anything with, for instance in the following case:
person = {'name': 'Pietje Puk', 'age': 27, 'profession': 'Data Scientist'}
_, age, _ = person.values().
>>> age
27
.get instead of [key] for dictionary iterations
Dictionaries are great data-types for storing values with an attribute field known as the key, in so-called key-value pairs. When extracting key-value pairs from dictionaries, avoid running into KeyError exceptions with the .get method instead of the more traditional [key] method. The .get method provides a default value if the key is not present.
In line 6-10 our program stops running because of the KeyError exception, but in line 11-14 our program continues, using the default 'undefined' string that is set in line 12 as the second argument of the .get method.
Please use the .get method with caution. The explicitness of the exceptions module in Python is preferable by design: It is good to know where your program halts and the reason why.
Here’s a quick tip to map keys in Python dictionaries with the .get method. Notice that "ORGANIZATION" key does not exist in the mapping dictionary and thus cannot be called with [key], hence why we use .get and the value "ORGANIZATION" does not change.
mapping = {"FIRST_NAME": "PERSON", "ADDRESS": "LOCATION"}
list = ["FIRST_NAME", "ORGANIZATION", "ADDRESS"]
list = [mapping.get(label, label) for label in list]
list
>>>["PERSON", "ORGANIZATION", "LOCATION"]
Other examples where I used .get is when requesting information via an API or scraping webpages and accessing key-value pairs in a parsed .JSON string that might change or not exist with the next call or webpage. More on .JSON in the story below.
When you have the possibility of assigning a default value for accessing key-value pairs in a dictionary and you want/can disregard the keyError exception , use the .get method to avoid your program from stopping prematurely.
Loop two iterators with the zip function
I love the zip function and it has saved me countless (nested) loops. I use it mostly for iterating over two data types at the same time, where I need the indexes to be equal.
You can do this with any data type or generator. For instance, you could create dictionaries without looping over the separate lists, as such:
Later on, we’ll introduce the *args operator, which in combination with the zip function is very powerful!
List comprehensions
List, tuple, and dictionary comprehensions are ways to code more efficiently: do the same in fewer lines of code.
Both lines 2-4 and lines 13-15 are compressed in single-line expressions in line 6 and line 17. This saves up unnecessary loops and creates a cleaner codebase.
Multiple assignment with * and **
the * prefix operator was added to the multiple assignment syntax, allowing us to unpack the remaining items in an iterable.
first, second, *rest = [1, 2, 3, 4, 5, 6]
>>>first
1
>>>second
2
>>>*rest
[3, 4, 5, 6]
The ** operator does something similar, but with keyword arguments. The ** operator allows us to take a dictionary of key-value pairs and unpack it into keyword arguments in a function call.
Putting it all together
The zip function, * and ** operators and list comprehension are powerful on their own, but they become especially interesting when combined. Below are three examples of a combination of these operators, functions, and comprehensions.
In the first example, we need to keep the documents and labels coming from two separate lists together in a tuple of (document, label) by their index. the zip function provides the perfect solution. line 6 is a list comprehension that ties this all together in a single expression. line 7 simply shuffles the tuple pairs in a random fashion. We then need to unpack the list of tuples [(first_doc, first_label), (second_doc, second_label)], and use the * operator together with zip for this. Variable X contains the documents and variable Y the labels, with corresponding indexes.
In the second example, we’re able to merge two dictionaries in a single expression, without having to instantiate a new empty dictionary and iterate over the first and second dictionary separately.
In the third example, we can use a function that only partly returns the information we need, while discarding the rest of the returned variables. The *_ operator applies to the rest of the items in the iterator, without having to explicitly define the number of items. This is useful when you don’t know how many items or returned variables there are!