We're sorry but this app doesn't work properly without JavaScript enabled. Please enable it to continue.

Build a Web Scraper in Python

Build a web scraper from scratch in Python. It's an SEO analytics tool that reports on the internal linking profile of any website.

Also available in:

gotypescript

What will you learn?

Use Python to build a web crawler that scrapes pages. It's a tool that any SEO expert would be happy to have. You'll make HTTP requests and parse HTML to generate reports that can easily marshaled to standard output or a file. If you're interested in getting a job doing data analysis, this project will teach you how Python can be used to build a command-line application.

Chapter List

1
Setup
Get started with web scraping by setting up your Python environment and building essential functions to normalize URLs and extract links from HTML content.
2
Crawling
Build the core crawling engine by implementing the main crawler logic, HTTP request handling, and recursive page discovery to systematically traverse websites.
3
Concurrency
Learn to speed up your web crawler using asynchronous programming and concurrency, crawling multiple pages simultaneously while respecting configurable limits.
4
Reporting
Learn to analyze and export your web scraping results with custom reporting tools that generate both human-readable summaries and JSON datasets for further analysis.

Join 527 students in the Build a Web Scraper in Python course

Read reviews of their learning experiences

This is a great course if you are comfortable with python and recursion. I learned a lot, even though it was a hard course.

(5/5)
Katherine Sei profile image

Katherine Sei

United States

Scraping the web is so much simpler than it used to be: but the caveat it that it's a minefield for the unsuspecting who might get a ban for doing it incorrectly! Really good course, full of useful detail.

(5/5)
Geoff Riley profile image

Geoff Riley

Warrington, England

You do build a web scraper but I reckon it serves as a much better intro to asyncio. That alone I think make it more worth it. Have fun!

(4/5)
Divine-Beanbag  profile image

Divine-Beanbag

United States

Cute little project! Really helped cement the content from the 'HTTP clients in Python' course.

(5/5)
Jeremy Cribb profile image

Jeremy Cribb

United States

This was a good chance to learn and practice concurrency with the asyncio library

(5/5)
Dave Andrea profile image

Dave Andrea

Sydney, Canada

Great project overall for exercising skills in file I/O and utilizing libraries like beautiful soup and asyncIO.

(5/5)
Ryan Greco profile image

Ryan Greco

Mobile, AL

I liked the project though I need to read more on the asyncio library. I added a timer and the speed did go up with more concurrent runners. Still not sure why give Python's GIL.

(5/5)
Tawfic A.Fatah profile image

Tawfic A.Fatah

Montreal, Canada

I enjoyed this and am looking forward to doing the Go version!

(5/5)
Eisengrin  profile image

Eisengrin

Houston, Texas

Really enjoyed it, highly recommend!

(5/5)
Attila Szász profile image

Attila Szász

Odorheiu Secuiesc, Romania

or view more reviews

Mediocrity doesn't cut it anymore

The only way to become a great developer is to write a lot of code

Avoid tutorial hell

by writing a ton of code

Stay motivated with

a game-like curriculum

Build portfolio projects

to prove your skills

Delve deeper

into foundational concepts

Learn flexibly online

without interrupting your life

For 1% the price of college

to minimize your financial risk

Frequently asked Questions

Got questions? We've got answers

Yes! It's free to create an account and start learning. You'll get all the immersive and interactive features for free for a few chapters. After that, if you still haven't paid for a membership, you'll be in read-only (content only) mode.