Requache = requests + cache

I do a lot of work scraping web content. This little function helps with two things: speed and manners.

Even if you optimize your scraping app well (using worker pools, concurrency, callbacks etc), you're still going to be limited by your network or the target server's speed.
If your processing of the data isn't right first time, you'll run again, and pull a bunch of data from their site. That's not cool, and in some cases will get you blocked.

To get around these two limitations, we simply cache the text returned by called to requests.

Whenever you'd normally call requests.get(url).text instead call requache.get(url). The first time you run your scraping app, everything works as usual. Second time round, everything's super fast and the target server is never touched. This means you can perfect your data processing on cached data as if you were running against live data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
requache.py		requache.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requache = requests + cache

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

oddsockmachine/requache

Folders and files

Latest commit

History

Repository files navigation

Requache = requests + cache

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages