This Python command line tool follows a filter based on a programming language and its relative file extension to fetch and download files from a pool of GitHub repositories found by GitHub API (virtually every existing public and not-a-fork GitHub repository)
python3 fetchgithubfiles.py language file-extension api-key [OPTIONS]...In order for the script to work you'll have to provide a GitHub Personal Access Token of yours that you can generate here. Once you have your Access Token at hand, you can set an environment variable named exactly GHScraperAPIToken OR you can use the specific command line option given below. Note that setting the Access Token via command line option discards the value of GHScraperAPIToken environment variable
languageis the programming language to search forfile-extensionis the file extension (without.) of the chosen programming language
-
-apitoken apitoken,--ApiToken=apitokenis a GitHub Personal Access Token of yours. Don't have it? Generate it here -
-mr maxrepos,--MaxRepos=maxreposis the maximum number of repos you want to fetch. Set as 100 by default -
-d dir,--Directory=diris the directory where fetched files will be saved. Set as./fetchedfilesby default. Directory should be specified in Linux style (./this/is/an/example) even if you are on Windows -
-t topic,--Topic=topicis the single topic for which repositories will be additionally filtered for. By default it has no value, meaning this filter won't be applied -
-k keywords,--Keywords=keywordsare one or more blank separated keywords for which repositories will be additionally filtered for. By default it has no value, meaning this filter won't be applied
python3 fetchgithubfiles.py rust rs -apitoken 123123123 -mp 20