Up Your Coding Skills With Python: Files

Discover what files in Python are all about

This article aims at familiarizing us with the various ways we can work with files and file objects in Python, more specifically, we’ll see how to open, close, and perform operations on such file objects to get what we need out of them. For simplicity and straightforwardness, I will start by assuming we all know what files are and what purpose they carry. They are simply data (the file contents), stored under named locations (the filename), on a non-volatile memory (SSD, HDD, USB/external flash drives, etc.). A typical cycle of operations performed on a file consists of three important steps:

opening the file;
read from / write to the file;
close the file.

Opening the file is being achieved by using the open() function:

my_file_object = open(filename, mode)

What this function does is it returns an _io.TextIOWrapper object, which is then assigned to the my_file_object variable.

# say we have a file in our current directory
# called "sample_text_file":

my_file_object = open("sample_text_file", "r")
print(type(my_file_object))

Output:
<class '_io.TextIOWrapper'>

That’s the most widely used form of calling open(). There are other keyword arguments, you can read a whole lot more about them here. For now, focusing on the two arguments we provided, it immediately becomes obvious that filename is a string representing the filename. To be more exact, it is the file path, be it a relative or an absolute path. If we only provide the function with a file name, like example.txt, then the assumed path is identical to the current working directory and, as a direct consequence, Python will try to open the example.txt file in the current working directory.

The other argument, mode is another string that refers to the mode we’re opening the file in. There are a couple of modes, as we can open a file for:

reading (this is the default): r;
writing (write to a file — overwrite if it exists, create it if it doesn’t): w;
exclusive writing (create the file, this will fail if it already exists): x;
appending (append to a file, create if it doesn’t exist): a.

There are also a couple more modes that can be used in conjunction with the above 4:

open the file in text mode (this is the default): t;
open the file for updating: +;
open the file in binary mode: b.

Let’s take a look at a few examples:

# open file in text mode, for reading
my_file = open("file.txt")

# same file, same mode (text mode, open for reading)
my_file = open("file.txt", "rt")

# open file for reading and writing - overwrites previous contents
my_file = open("file.txt", "r+")

# open for writing and reading - completely wipes previous contents
my_file = open("file.txt", "w+")

# open for appending and reading - does not overwrite data
my_file = open("file.txt", "a+")

# open for reading, binary mode
my_file = open("file.txt", "rb")

Opening files in binary mode will treat data as bytes objects, whereas text mode, will have data read from the file as str. Say we have a file, called file.txt, residing in our current working directory and containing the string abc. Let’s open the file in both binary and text mode and read one line from it:

# binary mode
my_file_obj = open("file.txt", "rb")
line = my_file_obj.readline()
print(f"{type(line)}: {line}")
my_file_obj.close()

# text mode
my_other_file_obj = open("file.txt", "rt")
line = my_other_file_obj.readline()
print(f"{type(line)}: {line}")
my_other_file_obj.close()

Output:
<class 'bytes'>: b'abc'
<class 'str'>: abc

Something to keep in mind. The mode for opening the file (text/binary) is, as we’ve just seen, reflected in the type of data we get from it. Conversely, if opening the file for writing, we should take the necessary steps to feed data of the correct type for the file object. Here’s what happens if we don’t adapt our data type to the mode we’ve opened the file in:

# open a file for writing in binary mode
my_file_obj = open("file.txt", "wb")

# create a string and attempt to write it to the file
line = "abc"
my_file_obj.write(line)
my_file_obj.close()

Output:
Traceback (most recent call last):
  ...
    my_file_obj.write(line)
TypeError: a bytes-like object is required, not 'str'

It’s also very important to note that the r+ and the w+ modes behave a little differently, so it’s worthwhile to study these up close.

Say we have a file called file.txt, containing only one line of data: Python. Let’s see what happens if we open it using the r+ and w+ modes, respectively. We’d also write the string abc to the file before closing it. Then we’d check the contents of the file:

my_file = open("file.txt", "r+")
my_file.write("abc")
my_file.close()

File contents:
abchon

As we could see for ourselves, in the case of the r+ mode, what happened was it positioned the cursor at the very start of the file and simply started writing the string we instructed it to. Some, but not all data was overwritten, as our string, abc, wasn’t long enough to overwrite all of the file’s contents.

Let’s see if the w+ mode behaves the same:

my_file = open("file.txt", "w+")
my_file.write("abc")
my_file.close()

File contents:
abc

This time around, by opening the file using the w+ mode, all data contained was lost. Following that, the abc string got written to the file and, as a result, that’s all we’re left with. The entire Python string that was originally there was simply gone.

As conclusion, we should be very careful when choosing the right mode to open our file.

As can any object in Python, file objects also can and actually do have a few properties of their own:

encoding — encoding of the text stream. One very known encoding standard is UTF-8;
errors — the error setting of the encoder/decoder. Can be either strict (raise a UnicodeDecodeError exception), replace (use U+FFFD, replacement_character), ignore (leave the character out of the result), or backslashreplace (inserts a \xNN escape sequence);
newlines — return the number of encountered newline characters while performing read operations on the file. Write operations aren’t impacting this count;
raw — this has ties to the RawIOBase class, which inherits the IOBase class. The specifics of this class are out of our scope, for now;

Our file object, once it has been created, has a few attributes of its own:

buffer — used for buffering raw binary streams. The mode in which the file is being opened dictates the type of this buffer. For example, if the file is being opened for reading, our buffer will be of the _io.BufferedReader type;
closed — whether the file is closed or not;
line_buffering — whether to enable line buffering;
write_through — whether writes are passed immediately to the underlying binary buffer;
closefd — whether to close the file descriptor at the same time when the I/O object is being closed. Only usable if instead of a str representation of a filename, a file descriptor is provided;
name — pretty straightforward, it’s the file name property of the file object;
mode — again, very straightforward, this is the mode our file has been opened in.

Now that we’ve quickly covered some of the file object’s attributes, let’s see what are the main methods we can use on such objects.

Closing a file

Very straightforward this one:

# open a file for appending
my_file_obj = open("file.txt", "a")

# close the file
my_file_obj.close()

# check if the file's been closed
print(my_file_obj.closed)

Output:
True

Separate the buffer from the text stream

You may encounter situations where you might need to do this. The detach() method effectively disconnects the buffer from the raw stream and then returns it (the buffer). Word of caution: the detached buffer is now in an unusable state. Attempting to close the file or to perform read/write operations on the file object will result in exceptions being thrown:

# open a file for appending
my_file_obj = open("file.txt", "a")

# detach the buffer
buffer = my_file_obj.detach()

print(buffer)

# attempt to close the file
my_file_obj.close()

Output:

<_io.BufferedWriter name='file.txt'>
Traceback (most recent call last):
  ...
    my_file_obj.close()
ValueError: underlying buffer has been detached

We got the buffer, as seen above, but any further operations on the file object now result in an error.

Get file descriptor of the file object

The fileno() method returns the underlying file descriptor of an open file (if it exists). It’s typically an integer:

# open a file for appending
my_file_obj = open("file.txt", "a")

# get its file descriptor
file_descriptor = my_file_obj.fileno()

print(file_descriptor)

# close the file
my_file_obj.close()

Output:
3

The reason for which we get 3 is that we have no other files open before this one and also that the 0, 1, and 2 file descriptors are reserved as follows:

0 is the file descriptor for standard input, or stdin;
1 corresponds to the standard output, also known as stdout;
2 is the file descriptor responsible for standard error (stderr).

Flushing the internal buffer

A good example, fit for showcasing the flush() method is writing data to a file. Typically, you will notice that the data is making its way onto the file after it has been closed:

# open a file for writing
my_file_obj = open("file.txt", "w")

# write 2 lines to the file
my_file_obj.write("First line\n")
my_file_obj.write("Second line\n")

# wait for Enter key. 
# check file.txt. Right now it should be empty.
input()

my_file_obj.close()
# check file.txt again. Now it should contain the data

If you followed along, you’ll have noticed that the data isn’t written to the file until the file’s been closed. Now let’s see what happens if we use flush() on the file before closing it:

# open a file for writing
my_file_obj = open("file.txt", "w")

# write a first line to the file
my_file_obj.write("First line\n")

# flush the buffer
my_file_obj.flush()

# write a second line to the file
my_file_obj.write("Second line\n")

# wait for Enter key. 
# check file.txt. It should contain the first line.
input()

my_file_obj.close()
# check file.txt again. Now it should contain all data

As we’ve seen, flush() forced the writing of the internal buffer data to the file before the file was closed. There are a number of situations you may find this very useful in and, as such, you should know about this nice-to-have method for your Python file I/O tool belt.

Determine if the file stream is interactive

Or, in other words, if the file is connected to a tty (or tty-like) device (for instance, a terminal), the isatty() method will return True. Otherwise, it will obviously return False:

# open a file for writing
my_file_obj = open("file.txt", "w")

# is it associated with a terminal?
result = my_file_obj.isatty()

print(result)

# close the file
my_file_obj.close()

Output:
False

Obviously, we only just opened a regular file for writing, so the False outcome shouldn’t surprise us. But if we’d go ahead and build some pipe file, like in this great example, then we’d have a file connected to a terminal.

Peek at the buffer

A method that helps us achieve this, that works for files opened for reading in binary mode, peek() is giving us the buffer’s contents without having to perform reading operations:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL:

# open the file for reading, in binary mode
my_file_obj = open("file.txt", "rb")

# return the current buffer contents
result = my_file_obj.peek()
print(result)

# perform a reading operation
result = my_file_obj.readline()
print(result)

# check the buffer once again
result = my_file_obj.peek()
print(result)

# finally close the file
my_file_obj.close()

Output:
b'ABCD\nEFGH\nIJKL'
b'ABCD\n'
b'EFGH\nIJKL'

What we’ve done is print out the buffer’s contents — the first output line shows all three lines of the file being displayed as a bytes string, followed by a reading operation where we’ve read one line and displayed it on the second output line, after which we took another look at the buffer, which had shortened by exactly one line — the line which was previously read. The third output line gives us the remaining buffer contents.

Read from file — read()

The read() method can be used to read characters from a file. The optional integer argument (size) specifies the number of characters to read from the file. Omitting it or setting it to a negative value would cause it to read all available data, either by calling readall() or multiple read operations on the underlying raw stream:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL:

# open the file for reading
my_file_obj = open("file.txt", "r")

# read 1 char
result = my_file_obj.read(1)
print(result)

# argumentless read operation
result = my_file_obj.read()
print(result)

# close the file
my_file_obj.close()

Output:
A
BCD
EFGH
IJKL

First, we’ve read 1 character. After that, we performed an argumentless read operation, giving us the rest of the file contents.

Read from file — read1()

No, not a typo, it’s a method that’s available for files open for reading — obviously — in binary mode. It distinguishes itself from the previous read() method in that in the case of not enough data in the buffer to satisfy the optional size argument, at most one read operation to the underlying raw stream will be performed. If size is negative or omitted, there’s no read operation on the underlying raw stream and all the available buffer is being returned:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL:

# open the file for reading, in binary mode
my_file_obj = open("file.txt", "rb")

# read 1 byte
result = my_file_obj.read1(1)
print(result)

# argumentless read operation
result = my_file_obj.read1()
print(result)

# close the file
my_file_obj.close()

Output:
b'A'
b'BCD\nEFGH\nIJKL'

Results are very similar to the read() operation. Except the result is being returned in bytes this time around, as opposed to str.

Determine if the file is readable

The readable() method will tell us if a file is readable or not:

# open a file in append mode
my_file_obj = open("file.txt", "a")

# this file isn't readable
result = my_file_obj.readable()
print(result)

# close the file
my_file_obj.close()

# open the same file, this time in read mode
my_file_obj = open("file.txt", "r")

# this time the file's readable
result = my_file_obj.readable()
print(result)

# close the file once more
my_file_obj.close()

Output:
False
True

Read bytes into a buffer — readinto()

To achieve this, we need a bytes-like object to represent the buffer and a file to read from, opened in binary mode. Then, the readinto() method does its thing. It’s worth noting that, under the hood, there can be multiple read operations performed on the underlying stream:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL:

# open a file in binary mode, for reading
my_file_obj = open("file.txt", "rb")

# construct our buffer
buffer = bytearray(16)

# read from the file into the buffer
my_file_obj.readinto(buffer)
print(buffer)

# close the file
my_file_obj.close()

Output:
bytearray(b'ABCD\nEFGH\nIJKL\x00\x00')

As we’ve been able to see, the buffer was initially filled with \x00 bytes and then populated with the data we read from the file.

Read bytes into a buffer — readinto1()

Similar to readinto(), except that there will be at most one single read operation on the underlying stream, as opposed to readinto(), where there is the possibility of multiple read operations being performed on the underlying stream:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL:

# open a file in binary mode, for reading
my_file_obj = open("file.txt", "rb")

# construct our buffer
buffer = bytearray(16)

# read from the file into the buffer
my_file_obj.readinto1(buffer)
print(buffer)

# close the file
my_file_obj.close()

Output:
bytearray(b'ABCD\nEFGH\nIJKL\x00\x00')

Read one line from the file

In the eventuality where we need to read one line (or perform line-by-line reading) from a file, the readline() method is here to help with just that:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL:

# open a file for reading
my_file_obj = open("file.txt", "r")

# read one line from the file
result = my_file_obj.readline()
print(result)

# read a second line from the file
result = my_file_obj.readline()
print(result)

# close the file
my_file_obj.close()

Output:
ABCD

EFGH

One thing to note, readline() doesn’t strip the read line of the newline (\n) character, as can be seen above.

Read lines from the file

If what we need is to read all the lines from a file and store them in a list, readlines() is the method for us. It features an optional integer argument that stops adding lines to the output list if the total number of bytes returned exceeds that argument. Let’s see the method in action:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL\nMNOP:

# open a file for reading
my_file_obj = open("file.txt", "r")

# read lines from the file, until 5+ bytes are returned
result = my_file_obj.readlines(5)
print(result)

# read all remaining lines
result = my_file_obj.readlines()
print(result)

# close the file
my_file_obj.close()

Output:
['ABCD\n', 'EFGH\n']
['IJKL\n', 'MNOP\n']

Directly iterating through the file

It has to be said that if all we want is to read from the file, line by line, but we’re not interested in storing all those lines in a list, we can simply read from the file object directly. The file object is of type _io.TextIOWrapper and, in case we’ve opened the file for reading purposes, it can be iterated over, allowing access to the file contents themselves. Let’s see how:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCD\nEFGH\nIJKL\nMNOP:

# open a file for reading
my_file_obj = open("file.txt", "r")

# read directly from the file, line by line
for line in my_file_obj:
    print(line)

# close the file
my_file_obj.close()

Output:
ABCD

EFGH

IJKL

MNOP

More on this here, where the official docs support this and even advise us to do so, as it results in a simpler, more readable code, in addition to being more efficient and faster. Please note that iterating over the file object works as long as there are still lines to read. Once we’ve reached the end, we need to do something like my_file_obj.seek(0) to move the cursor to the start of the file in order to be able to iterate over the file object again.

Reconfiguring the text stream

Suppose we want a slightly different configuration of the text stream once the file has been opened and closing the file only to reopen it under a different configuration isn’t an option. The reconfigure() method allows us to set values for a few parameters, like:

encoding
errors
newline
line_buffering
write_through

# say we have a file in our current directory
# called "file.txt"

# open a file for reading, UTF-16 encoding
my_file_obj = open("file.txt", "r", encoding="UTF-16")
print(my_file_obj.encoding)

# reconfigure the text stream for UTF-8 encoding
my_file_obj.reconfigure(encoding="UTF-8")
print(my_file_obj.encoding)

# close the file
my_file_obj.close()

Output:
UTF-16
UTF-8

Seeking in files

The seek() method proves very useful in situations where you only need to read some specific fragments of the file. It comes with 2 arguments:

offset — an integer specifying the offset in bytes;
whence — this is an optional argument that defaults to 0 if omitted and that has 3 possible values: 0 signifies absolute file positioning, so the offset is relative to the start of the file, 1 means seeking relative to the current position in the file, and 2 means seeking relative to the end of the file;

This method is best used when dealing with files open in binary mode. For files open in text mode, we’ll find that its functionalities are limited to only seeking from the start of the file (seek(i, 0)) and jumping straight to the end of the file via seek(0, 2).

Let’s see how this method behaves:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCDEFGHIJKLMNOPQRSTUVWXYZ:

# open a file for reading
my_file_obj = open("file.txt", "rb")

# move the cursor 3 bytes from the start
location = my_file_obj.seek(3, 0)
print(location)

# from that location, read 2 bytes
result = my_file_obj.read(2)
print(result)

# advance cursor 1 byte from current location
location = my_file_obj.seek(1, 1)
print(location)

# from that new location, read 3 bytes
result = my_file_obj.read(3)
print(result)

# move cursor 5 bytes before end of file
location = my_file_obj.seek(-5, 2)
print(location)

# from that new location, read 4 bytes
result = my_file_obj.read(4)
print(result)

# close the file
my_file_obj.close()

Output:
3
b'DE'
6
b'GHI'
21
b'VWXY'

Determine if a file is seekable

Now that we’ve just seen how seek() behaves, we can also check if a file is seekable before we even attempt a seek operation on the file. The simple seekable() method returns a boolean (True / False) indicating if the file allows access to the file stream or not.

# open a file for reading
my_file_obj = open("file.txt", "rb")

# is the file seekable?
result = my_file_obj.seekable()
print(result)

# close the file
my_file_obj.close()

Output:
True

Getting the current cursor location

There are times when we need to get the current cursor location in the file. This location is the position where data will be written or read from and it’s an important piece of information to know. Immediately upon opening a file, the cursor sits at the beginning of the file, so initially, this location is 0. As we write/read data to/from the file, this location automatically advances. It’s also possible to move this cursor via the seek() method. To get the location of the cursor we use the tell() method with great success:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCDEFGHIJKLMNOPQRSTUVWXYZ:

# open a file for reading
my_file_obj = open("file.txt", "r")

# read 5 bytes
my_file_obj.read(5)

# get the current position within the file
result = my_file_obj.tell()
print(result)

# close the file
my_file_obj.close()

Output:
5

As expected, we’ve read 5 bytes from the beginning of the file and that automatically advanced the cursor 5 bytes. So, naturally, the result is that the cursor is now at 0+5=5.

Truncate a file

The truncate() method effectively truncates the file’s size. There is an optional argument, size, which instructs the method to truncate the file to at most size bytes. If it is not present, then it will simply truncate everything from the current position to the end.

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCDEFGHIJKLMNOPQRSTUVWXYZ:

# open a file for reading and updating, binary mode
my_file_obj = open("file.txt", "r+b")

# read 4 bytes
my_file_obj.read(4)

# truncate all remaining data
result = my_file_obj.truncate()

# wait for Enter key. 
# check file.txt. Right now it should contain 'ABCD'
input()

# truncate the file size to 3 bytes
result = my_file_obj.truncate(3)

# close the file
my_file_obj.close()

# now check file.txt again. Should contain 'ABC'

What’s important to note here is that if the size argument is greater than the actual size of the file, and Python’s behavior, in this case, will be platform-dependent. According to this, possibilities include that the file may remain unchanged, increase to the specified size as if zero-filled, or increase to the specified size with undefined new content.

Check if the file is writable

The writable() method returns a boolean True or False, depending on whether the file is writable or not:

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCDEFGHIJKLMNOPQRSTUVWXYZ:

# open a file for appending
my_file_obj = open("file.txt", "a")

# this file should be writable
result = my_file_obj.writable()
print(result)

# close the file
my_file_obj.close()

# now open the same file, for reading
my_file_obj = open("file.txt", "r")

# this time the file should not be writable
result = my_file_obj.writable()
print(result)

# close the file again
my_file_obj.close()

Output:
True
False

Write to a file

The write() method writes a specified string to a file and it also returns an integer representing the number of bytes successfully written to the file.

# say we have a file in our current directory
# called "file.txt", having the following contents:
# ABCDEFGHIJKLMNOPQRSTUVWXYZ:

# open a file for appending
my_file_obj = open("file.txt", "a")

# write 'Hello, World!' (append it) to the file
result = my_file_obj.write("Hello, World!")
print(result)

# close the file
my_file_obj.close()

# the file should now contain:
# "ABCDEFGHIJKLMNOPQRSTUVWXYZHello, World!"

Output:
13

Write a sequence of strings to a file

We can also write a sequence of strings to a file using the writelines() method. Note: it does not add the separator characters.

# open a file for writing
my_file_obj = open("file.txt", "w")

# list of lines to write to the file
lines = ["ABCD", "EFGH", "IJKL", "MNOP"]

# write the list of lines
my_file_obj.writelines(lines)

# close the file
my_file_obj.close()

# the file should now be created in your current directory
# and contain: "ABCDEFGHIJKLMNOP"

Hope this somewhat lengthy overview of files in Python was sort of helpful. If so, then its objective has been met. The Internet is full of information on this very subject, so be sure to use it to enrich your knowledge on files and best practices to employ while working with them in Python. You’ll thank yourself later. A small effort every day goes a very long way.

That being said, I thank you and look forward to the next one. Happy coding! Cheers!

Deck is a software engineer, mentor, writer, and sometimes even a teacher. With 12+ years of experience in software engineering, he is now a real advocate of the Python programming language while his passion is helping people sharpen their Python — and programming in general — skills. You can reach Deck on Linkedin, Facebook, Twitter, and Discord: Deck451#6188, as well as follow his writing here on Medium.

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Check out our Community Discord and join our Talent Collective.