Automate Incoming File Processing with Python
In this article I will show you how you can automatically process files after they enter a certain file folder. We do this by writing a directory watcher in Python.
What is a directory watcher?
A directory watcher is a system that checks a directory (file folder) to see whether files have entered it. When it notices that a file has entered the directory it will make sure it gets processed.

A directory watcher example
Before we look at how we can make a directory watcher in Python I will show you an example of what it could be used for.
Suppose we have a directory called Orders, in that directory there is a directory called New in which files are placed by another system. Each file is about an incoming order. The filename contains the ordernumber and the file itself contains a line for each article with the ordered quantity separated by a comma. For example order0007.csv:
Apple,7
Banana,14
Watermelon,2The directory watcher is supposed to notice that a file has been placed in directory New and write the data to a file called collected_orders.csv where all orders are collected.
The 4 systems of a directory watcher in Python
The directory watcher that we will create will consist of 4 systems:
- a system to repeat a process every X seconds
- a system to check whether there are files in a directory
- a system that processes the files (this is specific to the use case)
- a system that moves a file out of the watched directory
Let’s see how to create each system in Python!
1. Make a process be repeated every X seconds
When I think of repetition in Python, I immediately think about loops. A for loop typically has a fixed amount of repetitions but a while loop doesn’t have that. More specifically, if we use a while True loop, the loop will never stop.
Let’s say we want the directory watcher to check the file folder every 10 seconds. We can then use the sleep function from the time package to make Python sleep (or pause) 10 seconds. If we put time.sleep(10) inside the while loop, Python will wait 10 seconds before continuing with the next repetition of the while loop.
import time
while True:
print('test')
time.sleep(10)Be sure that you know how to stop a forever-running Python script, for instance with a key-board interrupt, by closing the terminal or killing the process.
2. Check whether there are files in a directory
To check whether there are files in a file folder and, if so, what the file names are we can use the listdir function from the os package.
You use the listdir function by passing a directory’s path to it and it will return a list of the directory’s content. Here is an example:
directory_path = 'C:/Users/Better_Everything/Documents/Orders/New/'
files = os.listdir(directory_path)
print(files)The above code prints: [‘order0007.csv’].
3. Process the file according to a specific use case
Although the processing of a file is specific to its use case, looping over the files and determining the filepath is not:
for file in files:
filepath = directory_path + fileNote 1: Keep in mind that when there are no files in the directory, files will be an empty list and the for loop lines will just be skipped.
Note 2: if the order in which the files are processed is important, you can sort the list files before looping over them.
I’ll include the code specific to our previously described example as well to get a working example of a directory watcher.
We will split the ordernumber from the filename, read the lines in the file and add lines to the collected_orders.csv file.
for file in files:
filepath = directory_path + file
##BEGIN: USE CASE SPECIFIC##
ordernumber = file[5:].split('.csv')[0]
orderlines = []
with open(filepath,'r') as f:
for line in f:
line = line.strip()
orderlines.append('{},{}\n'.format(ordernumber,line))
with open('C:/Users/Better_Everything/Documents/Orders/collected_orders.csv','a') as f:
for line in orderlines:
f.write(line)
##END: USE CASE SPECIFIC##4. Removing the processed file from the watched directory
To prevent the same file to be processed again we have to remove it from the watched directory. You can choose to delete it or move it to another directory.
I choose to make a directory called Processed in the Orders directory to which I will move the processed files.
This file removing system has to be placed in the for loop, after the lines to process the file.
To move a file to another directory I use the move function from the shutil package. The first argument passed to it should be the current filepath and the second argument the destination filepath including filename.
destination_path = 'C:/Users/Better_Everything/Documents/Orders/Processed/'
destination_file = destination_path + file
shutil.move(filepath, destination_file)Testing our directory watcher
Our completed code now looks like this:
import time
import os
import shutil
directory_path = 'C:/Users/Better_Everything/Documents/Orders/New/'
destination_path = 'C:/Users/Better_Everything/Documents/Orders/Processed/'
while True:
files = os.listdir(directory_path)
for file in files:
filepath = directory_path + file
##BEGIN: USE CASE SPECIFIC##
ordernumber = file[5:].split('.csv')[0]
orderlines = []
with open(filepath,'r') as f:
for line in f:
line = line.strip()
orderlines.append('{},{}\n'.format(ordernumber,line))
with open('C:/Users/Better_Everything/Documents/Orders/collected_orders.csv','a') as f:
for line in orderlines:
f.write(line)
##END: USE CASE SPECIFIC##
destination_file = destination_path + file
shutil.move(filepath, destination_file)
time.sleep(10)If we start the script and move files into the Orders/New directory we see that:
- the data gets written in
collected_orders.csv - the files get relocated to
Orders/Processed
This means we have succesfully created a directory watcher that automates incoming file processing! This will keep going as long as the script runs.
Thank you for reading!
You can get full access to all my posts by joining Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium:
You might also like:
