How to Extract Attachments from Gmail with Python
Use IMAP protocol with Python to automate Gmail data extraction

Email has been here since the very beginning of the internet. Before messaging apps, video calls and now the metaverse, Email was and remains, one of the main sources of communication. People rely on Microsoft Outlook, Gmail, Proton Mail and others, for business, newsletters, writing letters, transact documentation and much more.
The diversity of email usage ultimately makes it a valuable source of data. Extracting attachments from Gmail for instance, can bring several benefits such as the following:
- Data Analysis: If you have a subscription that regularly sends you PDF files, CSVs, TXTs, or other formats, you can extract and analyze them.
- Automation: Let’s say you have several clients that send you Purchase Orders every day, and you don’t want to waste time downloading and grabbing the information from them manually. You can extract the data, apply processing steps and upload the cleaned data to a database.
- Integration: Email can be used as a gateway of information for your application. For instance, files sent to your inbox can seamlessly be redirected to your platform’s database.
- Deep Learning: Image, video and text files that are sent to your inbox, can be automatically downloaded and used to feed Deep Learning models.
These examples are just a sneak peek of the potentialities that arise from Email attachments. With Python Programming Language, we can easily apply the topics above. It offers several libraries to manipulate data, has great integration with Data Visualisation tools, is the leading programming language when it comes to Machine Learning and Deep Learning, and you can use imaplib and email packages to extract data from Gmail.
1 — Configurate your Google and Gmail account
To access the Gmail inbox through a Python script we first need to do a simple configuration.
First, go to your Gmail account, and click on Settings -> See all settings
you’ll see different tabs go to Forwarding and POP/IMAP
and Enable IMAP
like in the image below:

Finally, we need to get a password to connect with through the Internet Message Access Protocol (IMAP), for that, the fastest way is to go to this link. Here, you should be able to create an App password.

When you click on Create
, make sure to copy the password to a safe place.
2 — Get all messages from a Gmail folder
With imaplib Python library we can connect to an email server, access email folders (such as the inbox, sent items, or custom folders), and perform various operations on email messages. Let’s start by importing the library and making a function to extract the messages from a folder.
import imaplib
from tqdm import tqdm
def get_messsages_gmail(
user_email,
user_password,
last_email=-1,
email_folder='INBOX',
from_email="All"):
"""This function extracts the messages objects from a gmail account"""
# connect to gmail
gmail = imaplib.IMAP4_SSL("imap.gmail.com")
# sign in with your credentials
gmail.login(user_email, user_password)
# select the folder
gmail.select(email_folder)
if from_email == 'All':
resp, items = gmail.search(None, from_email)
else:
resp, items = gmail.search(None, f"(FROM {from_email})")
items = items[0].split()
msgs = []
for num in tqdm(items[:last_email]):
typ, message_parts = gmail.fetch(num, '(RFC822)')
msgs.append(message_parts)
return msgs
This function takes the user_email
, the user_password
, which is the one obtained in the previous step, the last_email
, the email_folder
, and from_email
, in this variable, we can specify a sender email address or put All
to extract from all senders.
Then we use the instance imaplib
, which encapsulates the connection to IMAP4 server, in this case, the Gmail one: imap.gmail.com
.
We call the connection object gmail
and we use it to login, then we select the folder (INBOX
by default). The if
condition triggers the search for email content, either from all senders or a specific one, and saves the variables resp
and items
.
From the items list, we select the first group of items (items[0]
), which are message identifiers, and we separate them with .split()
.
Finally, we iterate over the item identifiers and ask the server to return the email messages in the RFC 822 format. We append all these messages to the msgs
list and return it.
3 — Extract PDF files from the messages list
Now we can do another function to take the list generated with the previous one and extract the PDF files from each message.
import email
import os
def get_pdf_attachments(msgs, data_folder):
"""This function extracts the pdf files"""
for msg_raw in msgs:
if type(msg_raw[0]) is tuple:
msg = email.message_from_string(str(msg_raw[0][1], 'utf-8'))
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
try:
if (".pdf" in part.get_filename())\
or (".PDF" in part.get_filename()):
filename = part.get_filename()
file_path = os.path.join(data_folder, filename)
# Save the file
with open(file_path, 'wb') as file:
file.write(part.get_payload(decode=True))
except Exception as error:
print(error)
pass
It only takes two arguments, the list of messages and the data folder where we want to save the files.
We iterate over the msgs list and in each loop we get the parsed message using email.message_from_string()
. The str()
function is used to decode the message and create a human-readable string from it.
The msg.walk()
function is used to iterate over the parts of the message, such as text, images and other attachments.
If the main content type is multipart
the continue
statement is executed, which means the code will skip the current part and move on to the next part in the email message. The same happens if the Content-Disposition
is None
.
Finally, we can grab the PDF files, by checking if there is a “.pdf “
string in the filename of the parts with part.get_filename()
.
The .get_filename()
function only extracts the name of the files, to get the content we need .get_payload()
, to write the output to the created files.
Conclusion
The implementation is easy, with two functions we are ready to extract attachments from Gmail. In the example, we did it with PDFs but it can be extended to other file formats, such as CSV, XLSX, TXT, PNG and so on. Now you have the script that can, for instance:
- Save you hours of work, by integrating it in your automation script.
- Gather valuable data and derive insights from your newsletters using data analysis.
- Download and upload data to your database.
Be inventive, as email won’t leave us so soon.
In Plain English
Thank you for being a part of our community! Before you go:
- Be sure to clap and follow the writer! 👏
- You can find even more content at PlainEnglish.io 🚀
- Sign up for our free weekly newsletter. 🗞️
- Follow us: Twitter(X), LinkedIn, YouTube, Discord.
- Check out our other platforms: Stackademic, CoFeed, Venture.