avatarBetter Everything

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4186

Abstract

re>directory_path = <span class="hljs-string">'C:/Users/Better_Everything/Documents/Orders/New/'</span> files = <span class="hljs-built_in">os</span>.listdir(directory_path) <span class="hljs-built_in">print</span>(files)</pre></div><p id="f51d">The above code prints: <code>[‘order0007.csv’]</code>.</p><h2 id="26ea">3. Process the file according to a specific use case</h2><p id="a221">Although the processing of a file is specific to its use case, looping over the files and determining the filepath is not:</p><div id="137c"><pre><span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files: filepath = directory_path + file</pre></div><p id="6ff5"><b>Note 1: </b>Keep in mind that when there are no files in the directory, <code>files</code> will be an empty list and the <code>for</code> loop lines will just be skipped. <b>Note 2: </b>if the order in which the files are processed is important, you can sort the list <code>files</code> before looping over them.</p><p id="3085">I’ll include the code specific to our previously described example as well to get a working example of a directory watcher.</p><p id="f1f7">We will split the ordernumber from the filename, read the lines in the file and add lines to the <code>collected_orders.csv</code> file.</p><div id="137f"><pre><span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files: filepath = directory_path + file <span class="hljs-comment">##BEGIN: USE CASE SPECIFIC##</span> ordernumber = file[<span class="hljs-number">5</span>:].split(<span class="hljs-string">'.csv'</span>)[<span class="hljs-number">0</span>] orderlines = [] <span class="hljs-keyword">with</span> <span class="hljs-built_in">open</span>(filepath,<span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> f: <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> f: line = line.strip() orderlines.append(<span class="hljs-string">'{},{}\n'</span>.<span class="hljs-built_in">format</span>(ordernumber,line))

<span class="hljs-keyword">with</span> <span class="hljs-built_in">open</span>(<span class="hljs-string">'C:/Users/Better_Everything/Documents/Orders/collected_orders.csv'</span>,<span class="hljs-string">'a'</span>) <span class="hljs-keyword">as</span> f:
    <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> orderlines:
        f.write(line)
<span class="hljs-comment">##END: USE CASE SPECIFIC##</span></pre></div><h2 id="5ae1">4. Removing the processed file from the watched directory</h2><p id="a8bd">To prevent the same file to be processed again we have to remove it from the watched directory. You can choose to delete it or move it to another directory.</p><p id="ef0a">I choose to make a directory called <code>Processed</code> in the <code>Orders</code> directory to which I will move the processed files.</p><p id="3c42">This file removing system has to be placed in the <code>for</code> loop, after the lines to process the file.</p><p id="a82f">To move a file to another directory I use the <code>move</code> function from the <code>shutil</code> package. The first argument passed to it should be the current filepath and the second argument the destination filepath including filename.</p><div id="2b9f"><pre>destination_path = <span class="hljs-string">'C:/Users/Better_Everything/Documents/Orders/Processed/'</span>

destination_file = destination_path + file shutil.move(filepath, destination_file)</pre></div><h2 id="b747">Testing our directory watcher</h2><p id="1e2a">Our completed code now looks like this:</p><div id="f280"><pre><span class="hljs-keyword">import</span> time <span class="hljs-keyword">import</span> os <span class="hljs-keyword">import</span> shutil

directory_path = <span class="hljs-string">'C:/Users/Better_Everything/Documents/Orders/New/'</span> destination_path = <span class="hljs-string">'C:/Users/Better_Everything/Documents/Orders/Processed/'</span>

<span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>: files = os.listdir(directory_path) <span class="hljs-ke

Options

yword">for</span> file <span class="hljs-keyword">in</span> files: filepath = directory_path + file <span class="hljs-comment">##BEGIN: USE CASE SPECIFIC##</span> ordernumber = file[<span class="hljs-number">5</span>:].split(<span class="hljs-string">'.csv'</span>)[<span class="hljs-number">0</span>] orderlines = [] <span class="hljs-keyword">with</span> <span class="hljs-built_in">open</span>(filepath,<span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> f: <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> f: line = line.strip() orderlines.append(<span class="hljs-string">'{},{}\n'</span>.<span class="hljs-built_in">format</span>(ordernumber,line))

    <span class="hljs-keyword">with</span> <span class="hljs-built_in">open</span>(<span class="hljs-string">'C:/Users/Better_Everything/Documents/Orders/collected_orders.csv'</span>,<span class="hljs-string">'a'</span>) <span class="hljs-keyword">as</span> f:
        <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> orderlines:
            f.write(line)
    <span class="hljs-comment">##END: USE CASE SPECIFIC##</span>

    destination_file = destination_path + file
    shutil.move(filepath, destination_file)

time.sleep(<span class="hljs-number">10</span>)</pre></div><p id="d31b">If we start the script and move files into the <code>Orders/New</code> directory we see that:</p><ul><li>the data gets written in <code>collected_orders.csv</code></li><li>the files get relocated to <code>Orders/Processed</code></li></ul><p id="1492">This means we have succesfully created a directory watcher that automates incoming file processing! This will keep going as long as the script runs.</p><h1 id="086e">Thank you for reading!</h1><p id="f5db">You can <b>get full access to all my posts by joining Medium</b>. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium:</p><div id="80fc" class="link-block">
      <a href="https://medium.com/@BetterEverything/membership">
        <div>
          <div>
            <h2>Join Medium with my referral link — Better Everything</h2>
            <div><h3>Read every story from Better Everything (and thousands of other writers on Medium). Your membership fee directly…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*aa4Y_6MHVoY6Wl-9)"></div>
          </div>
        </div>
      </a>
    </div><p id="bf17">You might also like:</p><div id="f978" class="link-block">
      <a href="https://readmedium.com/automate-removal-of-old-files-in-python-2085381fdf51">
        <div>
          <div>
            <h2>Automate Removal of Old Files in Python</h2>
            <div><h3>I explain how to build a Python program that checks if the files in a directory are older than a threshold and if so…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*fwDsvAbuFKi3f1zVnLRt9g.png)"></div>
          </div>
        </div>
      </a>
    </div><div id="bb9e" class="link-block">
      <a href="https://readmedium.com/passing-data-to-a-python-file-when-running-it-with-commands-3cc437667b2b">
        <div>
          <div>
            <h2>Passing Data to a Python File when Running It with Commands</h2>
            <div><h3>Feed input data into your Python script by using command line arguments and collecting them in your Python program.</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*lhn4RrOGYuv3mSc0D4f_zg.jpeg)"></div>
          </div>
        </div>
      </a>
    </div></article></body>

Automate Incoming File Processing with Python

In this article I will show you how you can automatically process files after they enter a certain file folder. We do this by writing a directory watcher in Python.

What is a directory watcher?

A directory watcher is a system that checks a directory (file folder) to see whether files have entered it. When it notices that a file has entered the directory it will make sure it gets processed.

A directory watcher notices incoming files and processes them. Image by catalyststuff on Freepik

A directory watcher example

Before we look at how we can make a directory watcher in Python I will show you an example of what it could be used for.

Suppose we have a directory called Orders, in that directory there is a directory called New in which files are placed by another system. Each file is about an incoming order. The filename contains the ordernumber and the file itself contains a line for each article with the ordered quantity separated by a comma. For example order0007.csv:

Apple,7
Banana,14
Watermelon,2

The directory watcher is supposed to notice that a file has been placed in directory New and write the data to a file called collected_orders.csv where all orders are collected.

The 4 systems of a directory watcher in Python

The directory watcher that we will create will consist of 4 systems:

  1. a system to repeat a process every X seconds
  2. a system to check whether there are files in a directory
  3. a system that processes the files (this is specific to the use case)
  4. a system that moves a file out of the watched directory

Let’s see how to create each system in Python!

1. Make a process be repeated every X seconds

When I think of repetition in Python, I immediately think about loops. A for loop typically has a fixed amount of repetitions but a while loop doesn’t have that. More specifically, if we use a while True loop, the loop will never stop.

Let’s say we want the directory watcher to check the file folder every 10 seconds. We can then use the sleep function from the time package to make Python sleep (or pause) 10 seconds. If we put time.sleep(10) inside the while loop, Python will wait 10 seconds before continuing with the next repetition of the while loop.

import time

while True:
    print('test')
    time.sleep(10)

Be sure that you know how to stop a forever-running Python script, for instance with a key-board interrupt, by closing the terminal or killing the process.

2. Check whether there are files in a directory

To check whether there are files in a file folder and, if so, what the file names are we can use the listdir function from the os package.

You use the listdir function by passing a directory’s path to it and it will return a list of the directory’s content. Here is an example:

directory_path = 'C:/Users/Better_Everything/Documents/Orders/New/'
files = os.listdir(directory_path)
print(files)

The above code prints: [‘order0007.csv’].

3. Process the file according to a specific use case

Although the processing of a file is specific to its use case, looping over the files and determining the filepath is not:

for file in files:
    filepath = directory_path + file

Note 1: Keep in mind that when there are no files in the directory, files will be an empty list and the for loop lines will just be skipped. Note 2: if the order in which the files are processed is important, you can sort the list files before looping over them.

I’ll include the code specific to our previously described example as well to get a working example of a directory watcher.

We will split the ordernumber from the filename, read the lines in the file and add lines to the collected_orders.csv file.

for file in files:
    filepath = directory_path + file
    ##BEGIN: USE CASE SPECIFIC##
    ordernumber = file[5:].split('.csv')[0]
    orderlines = []
    with open(filepath,'r') as f:
        for line in f:
            line = line.strip()
            orderlines.append('{},{}\n'.format(ordernumber,line))

    with open('C:/Users/Better_Everything/Documents/Orders/collected_orders.csv','a') as f:
        for line in orderlines:
            f.write(line)
    ##END: USE CASE SPECIFIC##

4. Removing the processed file from the watched directory

To prevent the same file to be processed again we have to remove it from the watched directory. You can choose to delete it or move it to another directory.

I choose to make a directory called Processed in the Orders directory to which I will move the processed files.

This file removing system has to be placed in the for loop, after the lines to process the file.

To move a file to another directory I use the move function from the shutil package. The first argument passed to it should be the current filepath and the second argument the destination filepath including filename.

destination_path = 'C:/Users/Better_Everything/Documents/Orders/Processed/'
destination_file = destination_path + file
shutil.move(filepath, destination_file)

Testing our directory watcher

Our completed code now looks like this:

import time
import os
import shutil

directory_path = 'C:/Users/Better_Everything/Documents/Orders/New/'
destination_path = 'C:/Users/Better_Everything/Documents/Orders/Processed/'

while True:
    files = os.listdir(directory_path)
    for file in files:
        filepath = directory_path + file
        ##BEGIN: USE CASE SPECIFIC##
        ordernumber = file[5:].split('.csv')[0]
        orderlines = []
        with open(filepath,'r') as f:
            for line in f:
                line = line.strip()
                orderlines.append('{},{}\n'.format(ordernumber,line))

        with open('C:/Users/Better_Everything/Documents/Orders/collected_orders.csv','a') as f:
            for line in orderlines:
                f.write(line)
        ##END: USE CASE SPECIFIC##

        destination_file = destination_path + file
        shutil.move(filepath, destination_file)

    time.sleep(10)

If we start the script and move files into the Orders/New directory we see that:

  • the data gets written in collected_orders.csv
  • the files get relocated to Orders/Processed

This means we have succesfully created a directory watcher that automates incoming file processing! This will keep going as long as the script runs.

Thank you for reading!

You can get full access to all my posts by joining Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium:

You might also like:

Python
Programming
Software Development
Automation
Data Engineering
Recommended from ReadMedium