Open Research Assessment Toolkit — building open infrastructure for responsible research assessment.
This repository is the working home for ORAT data collection, analysis experiments, and project documentation.
We are building a corpus of annual reports from important plant-based agricultural institutions in India (ICAR institutes, SAUs, and related bodies).
Right now the team is:
- Checking which institutions publish annual reports online
- Downloading reports where possible (PDF preferred; HTML index pages where useful)
- Recording problems as GitHub Issues — timeouts, 403 blocks, broken links, missing reports, mirror URLs, etc.
Progress so far: 6 of the first 10 institutions attempted have reports in the repo. See india_plant_reports/README.md for the live status table.
| Path | Purpose |
|---|---|
docs/ |
Reference lists, session logs, methodology notes |
docs/india_plant_agricultural_institutions.md |
Master list of institutions (78 unique) |
india_plant_reports/ |
Downloaded annual reports — india_plant_reports/<institution>/report/ |
ingest/ |
Prototype pipeline for parsing and phrase indexing (experimental) |
docs/sessions/ |
Dated session notes |
New to the project? Start with GETTING_STARTED.md.
Collecting reports:
- Pick an institution from the institution list that does not yet have a folder under
india_plant_reports/. - Find the latest annual report on the institute website.
- Create
india_plant_reports/<abbrev>/report/and add the PDF (and index HTML if available). - Commit on your personal branch and push — see joining instructions.
When something fails, open a GitHub Issue with:
- Institution name and abbreviation
- URL you tried
- What happened (timeout, 403, no report found, corrupt PDF, etc.)
- Any alternate URL that worked (mirror domain, direct PDF link)
The project lead merges branches into main; you do not need to merge yourself.
Use GitHub Issues for anything that blocks or complicates collection:
| Issue type | Include |
|---|---|
| Site unreachable | Domain, error (timeout / DNS / SSL) |
| Access denied | URL, HTTP status (e.g. 403) |
| Report not found | Institution page URL, what you searched |
| Wrong or corrupt file | Filename, file size, how you obtained it |
| Mirror found | Primary URL (failed) + working mirror URL |
Label issues if available (e.g. collection, blocked). When in doubt, open the issue anyway.
- Getting started:
GETTING_STARTED.md - Git workflow for new members:
docs/joining-instructions.md - Collection status:
india_plant_reports/README.md - Issues: https://github.com/semanticClimate/ORAT/issues
See repository licence file when added. UN/institute report PDFs remain the property of their respective publishers; we store copies for research and analysis purposes.