avatarChristopher Tao

Summary

The web content discusses the advantages of using Python's Pathlib library over the traditional OS library for file system operations, emphasizing its intuitive syntax, convenience, and rich feature set.

Abstract

The article "Don’t Use Python OS Library Any More When Pathlib Can Do" advocates for the adoption of Python's Pathlib module as a superior alternative to the OS library for file system manipulation. It highlights Pathlib's more intuitive and feature-rich approach to common file system tasks such as displaying the current working directory, checking file existence, creating directories, listing directory contents, reading and writing files, and accessing file metadata. The author illustrates these points by comparing equivalent operations in both libraries, demonstrating Pathlib's ability to streamline code and enhance efficiency through its object-oriented design and built-in methods like exist_ok and parents for directory creation, and glob for pattern matching. The article concludes by recommending Pathlib for typical file system operations, while acknowledging the foundational importance of the OS library.

Opinions

  • The author believes that Pathlib is more user-friendly and efficient for file system operations compared to the OS library.
  • Pathlib's object-oriented approach, which returns a PosixPath object, is preferred for its ability to perform further operations without re-specifying the path.
  • The OS library is recognized as a fundamental Python library, but Pathlib is recommended for its advanced features and convenience in typical scenarios.
  • The article suggests that Pathlib's features, such as the exist_ok and parents flags for directory creation, simplify code and reduce the need for additional error handling.
  • The author points out the efficiency of Pathlib's generator for directory content, which is more memory-efficient than the OS library's list when only iterating over files once.
  • Pathlib's built-in globbing is seen as a significant advantage over the OS library, which requires an additional import from the Glob library for pattern matching.
  • The ease of reading and writing files with Pathlib's read_text and write_text methods is highlighted as a unique and convenient feature.
  • The author appreciates Pathlib's straightforward access to file metadata and statistics, comparing it favorably to the more cumbersome os.stat_result.
  • The article encourages readers to support the author and other writers by joining Medium Membership, indicating a belief in the value of the content provided.
Photo by Free-Photos on Pixabay

Don’t Use Python OS Library Any More When Pathlib Can Do

More intuitive, More convenient, More features

In recent years, Python is known by a lot more people who are not programmers. This is not only because of its popularity in the area of Machine Learning but also because it could be used to automate a lot of repetitive works such as bulk editing files with certain patterns.

In one of my previous articles (as follows), I have introduced the OS library in Python which will handle almost all the basic file system operations. These operations are highly recommended for those who have just started their journey to use Python to automate some repetitive tasks.

However, after we have mastered the OS library, I would also recommend stepping up to another library for most of the basic file system operations. That is the Pathlib, which is also a Python built-in library. It is more intuitive in terms of syntax, easier to use and has more features out of the box.

In this article, I’ll first compare some of the basic operations using the OS library and Pathlib to demonstrate the differences, as well as argue why Pathlib is recommended.

1. Show Current Directory

The first operation that I want to start with is to show the Current Working Directory (CWD). It is important because we might want to use relative paths most of the time. Therefore, it is sometimes important to know where we are at the moment.

OS Library

import os
os.getcwd()

Pathlib

from pathlib import Path
Path.cwd()

Difference

The outcome seems to be the same if we print them.

However, if we check the type of the objects, we’ll see that they are different.

The OS library will return a string, whereas the Pathlib will return an object of PosixPath. The benefit of having PosixPath returned is that we can directly use the returned object to do a lot more further operations. This will be demonstrated in later sections.

2. Check Directory or File Existing

In my case, I’m using Google Colab and there is a folder automatically created called “sample_data” every time a new notebook is provisioned. If I want to check whether there is such a directory, the following code will do.

OS Library

The function os.path.exists() takes a string type argument, which can be either the name of a directory or a file.

os.path.exists('sample_data/README.md')

Pathlib

When using the Pathlib, we simply pass the path as a string to the “Path” class, which will give us a “PosixPath” object. To test whether the “PosixPath” instance is existing in the file system, just call its method exist(), which is more intuitive than the OS library.

Path('sample_data/README.md').exists()

Difference

In terms of this operation, there is almost no difference between the two libraries. However, we can potentially write our code in this style when using Pathlib.

readme = Path('sample_data/README.md')
readme.exists()

Although we implemented this operation using one more line of code, we have got a reference for the readme.md file. In other words, we can use the variable readme later on for any other operations without having to pass the full path as a string again.

3. Create a Directory

Now, let’s create a directory called test_dir in our working directory, and see what are the differences between the OS library and the Pathlib.

OS Library

It is fairly easy to create a directory in the OS library.

os.mkdir('test_dir')

Pathlib

When it comes to the Pathlib, the syntax is quite intuitive.

Path('test_dir').mkdir()

The difference in Suppressing FileExistsError

In the OS library, when we are trying to create a directory that is already existing, an error will be thrown.

In my previous article, it has been suggested that we should always check the existence of a directory before creating it.

if not os.path.exists('test_dir'):
    os.mkdir('test_dir')

However, if we’re using Pathlib, it becomes much easier to handle this error.

Path('test_dir').mkdir(exist_ok=True)

The function mkdir() accepts a flag exist_ok. When it is set to true, the FileExistsError error is automatically ignored, which is equivalent to what we have done to the OS version implementation by adding an if-condition.

The difference in Creating Multi-level Depth Directory

Another major difference is for creating a directory when its parent directories are not existing. In the OS library, we have to use a different function to achieve this. Rather than mkdir(), we have to use makedirs().

os.makedirs(os.path.join('test_dir', 'level_1a', 'level_2a', 'level_3a'))

It does the job for sure. However, we have to always remember to use a different function.

If we’re using the Pathlib, again, we just need to set the flag parents to true.

Path(os.path.join('test_dir', 'level_1b', 'level_2b', 'level_3b')).mkdir(parents=True)

Don’t forget these are all flags for the same function. In other words, we can use both exist_ok and parents flags at the same time!

4. Show Directory Content

OS Library

When we want to show the content of a directory, it is easy in the OS library.

os.listdir('sample_data')

Pathlib

The syntax to show the content of a directory in Pathlib has nothing surprise as follows.

Path('sample_data').iterdir()

The difference in Returned Format

If we pay attention to the returned format of the Pathlib, it is actually providing a generator rather than a list of strings. We can get everything by popular all objects into a list from this generator.

list(Path('sample_data').iterdir())

However, in most cases, the reason that we want to get all the files in a directory is to perform some action one by one. Therefore, a generator would be more efficient if we just want to loop them once.

The difference in Using Glob

The OS library doesn’t provide the feature to search files using wildcards. Therefore, we have to import another library called Glob to help.

from glob import glob
list(glob(os.path.join('sample_data', '*.csv')))

If we’re using the Pathlib, very luckily, it comes with the “glob” feature.

list(Path('sample_data').glob('*.csv'))

5. Quick Read/Write to File

This feature is unique to Pathlib. When we want to read or write something to a file, it is quite common to use the following approach.

with open("test.txt", 'rw') as file:
    print(file.read())

This is indeed the standard in Python. However, sometimes we might just want to write very few bytes of stuff into a file. In this case, we can do that very easily using Pathlib.

f = Path('test_dir/test.txt'))
f.write_text('This is a sentence.')

Of course, we can also quickly read the content of the file to a string variable.

f.read_text()

6. Metadata of the File

In practice, it is quite common that we need some specific information about a file. Now, let me demonstrate how Pathlib can extract information and statistics about a file easily for us.

Let’s keep using the file that we have used in the previous section, which is the variable f.

print(f'Path of the file is:\n{f}')

Although we want to use the relative path in most cases, it is sometimes still necessary to check the absolute path of a file. We can do this very easily using Pathlib.

f.resolve()

Sometimes, we may want to get the file name only by getting rid of the file extension. Or, the opposite, we want to extract the extension name of the file. Both are easy by accessing the attributes of the file.

f.stem
f.suffix

Even most stunning, Pathlib can return the statistics, create/update time and so on easily from a PosixPath instance. This is equivalent to the os.stat_result, but much easier to be accessed and consumed.

f.stat()

For example, if we want to show the size of the file, just as follows.

f.stat().st_size

Summary

Photo by Hermann on Pixabay

In this article, I have introduced another Python built-in library, the Pathlib. It is considered to be more advanced, convenient and provides more stunning features than the OS library.

Of course, we still need to know how to use the OS library as it is one of the most powerful and basic libraries in Python. However, when we need some file system operations in typical scenarios, it is highly recommended to use Pathlib.

If you feel my articles are helpful, please consider joining Medium Membership to support me and thousands of other writers! (Click the link above)

Artificial Intelligence
Technology
Programming
Python
Software Development
Recommended from ReadMedium