Deal with Multi-Level Folders in Python with os.walk

Let’s say we have a multi-level directory full of files that we want to analyze.
main
|- A
|- 1.txt
|- 2.txt |- B
|- 3.txt |- C
|- 4.txtThese can be images, audio files, CSV files, or whatever you wish to analyze, but for demonstration purposes here, I’ll use .txt files. Here, the main folder contains multiple .txt files in different folders.
The os.walk Function
The os.walk function essentially looks through everything in the "main" folder — every file, folder, subfolder and file within each subfolder.
import osfor root, subfolders, filenames in os.walk("main"):
print(root, subfolders, filenames)
The os.walk function generates 3 variables — root, subfolders and filenames.
root is a string value referring to the file path starting from the "main" folder
subfolders is a list containing strings, and each string refers to a subfolder inside root
filenames is a list containing strings, and each string refers to a filename inside root
Getting Every .txt File inside main
In order to get the path of every single file inside the folder, we can simple join root and filename together.
import osfor root, subfolders, filenames in os.walk("main"):
for filename in filenames:
filepath = root + "/" + filename
print(filepath) # do stuff with filepath
Dealing with Files We Don’t Care About
Sometimes we might have random autogenerated files here and there for some reason — __pycache__, .DS_Store and all these other stuff. To stop our code from reading them accidentally, we can use a simple if statement to filter them out
import osfor root, subfolders, filenames in os.walk("main"):
for filename in filenames:
filepath = root + "/" + filename
print(filepath) if filename[-4:] != ".txt":
continue
# do stuff with filepathConclusion
If you didn’t already know about this function, I hope that this makes your life easier!
More content at plainenglish.io
