How to Edit PDF Hyperlinks using Python and pdfrw
Hyperlinks are an essential feature of PDF documents. They provide an easy way to navigate within a document or link to external resources. However, when you move files from one location to another, the links may break, making it difficult to access the desired resource. In this blog post, we will explore how to use Python to edit hyperlinks in PDF documents.
We will use the pdfrw library to edit the hyperlinks in PDF documents. The pdfrw library is a Python module that provides access to the internals of PDF files. It allows you to read, write, and modify PDF files using a simple syntax.
To get started, you need to install the pdfrw library by running the following command in the command prompt:
pip install pdfrwOnce you have installed the pdfrw library, you can use the following Python code to edit the hyperlinks in a PDF document:
import pdfrw
# Load the PDF file
pdf = pdfrw.PdfReader('original_document.pdf')
# Create an empty PDF file
new_pdf = pdfrw.PdfWriter()
# Loop through the pages in the PDF file
for page in pdf.pages:
# Check if the page has any hyperlinks
for annot in page.Annots or []:
old_url = annot.A.URI
if old_url == None:
continue
# Replace the old path with the new path
new_url = old_url.replace('C:\old_path', 'C:\new_path')
# Convert the new string for PDF strings
new_url = pdfrw.objects.pdfstring.PdfString(new_url)
# Update the hyperlink
annot.A.URI = new_url
# Add the modified page to the new PDF file
new_pdf.addpage(page)
# Save the modified PDF file
new_pdf.write('upddated_document.pdf')In this code, we first load the PDF file using the PdfReader() method. We then create an empty PDF file using the PdfWriter() method. We then loop through each page in the PDF file using the pages property of the PdfReader() object.
Within the loop, we check if the page has any hyperlinks by accessing the Annots property of the page.
- If the page has no hyperlinks, we use
continueto skip the rest of the code. - If the page has hyperlinks, we loop through each hyperlink and retrieve its URL using the
annot.A.URIproperty. If the hyperlink contains the old path, we replace it with the new path and update the hyperlink using thepdfrw.objects.pdfstring.PdfString()method. In this example, we replace the old path “C:\old_path” with the new path “C:\new_path”. You can replace these paths with your own paths as needed. Also note that the output file name and path can be customized to your liking.
Finally, we add the modified page to the new PDF file using the addpage() method of the PdfWriter() object. We then write the new PDF file to the desired location using the write() method.
Editing hyperlinks in PDF documents is essential when you move files from one location to another. The pdfrw library provides an easy and efficient way to modify hyperlinks in PDF documents using Python. With this code, you can easily update hyperlinks and ensure that your PDF documents remain accessible and usable.
If you would like to explore more PDF automation tools, please check out my articles:
- Scrape Data from PDF Files Using Python and PDFQuery
- Scrape Data from PDF Files Using Python and tabula-py
- How to Convert Scanned Files to Searchable PDF Using Python and Pytesseract
- Extract PDF Text While Preserving Whitespaces Using Python and Pytesseract
- How to Edit PDF Hyperlinks using Python and pdfrw
- How to Rotate PDF Pages using Python and pdfrw
Thank you for reading!
If you enjoy this article, please click the Clap icon. If you would like to see more articles from me and thousands of other writers on Medium. You can:
- Subscribe to my newsletter to get an email notification whenever I post a new article.
- Sign up for a membership to unlock full access to everything on Medium.






