This Python script allows you to extract all URLs from a given PDF file and generate a new PDF with clickable hyperlinks. It detects URLs starting with http://, https://, or www. and cleans them for formatting.
- GUI-based file picker (using
tkinter) - Scans the full content of a PDF
- Supports links starting with
http://,https://, orwww. - Outputs a clean, clickable PDF with extracted links
- Logs which pages have links and how many were found
- Python 3.7+
- Libraries:
PyMuPDF(fitz)fpdftkinter(built-in)
Install dependencies:
pip install pymupdf fpdf- Clone or download this repository.
- Run the script:
python extract_urls_with_http_https.py
- Select a PDF when prompted.
- Wait for the scan to complete.
- A new PDF file will be created in the same directory with
_extracted_links.pdfappended to its name.
The output is a simple PDF with blue clickable hyperlinks like:
http://example.com
https://secure.site/login
www.website.org/page
Zavier Chambers
Cybersecurity & Computer Science Student
May 25, 2025
This project is licensed under the MIT License.