Skip to content

semanticClimate/ORAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ORAT

Open Research Assessment Toolkit — building open infrastructure for responsible research assessment.

This repository is the working home for ORAT data collection, analysis experiments, and project documentation.


Current phase — annual report collection (India)

We are building a corpus of annual reports from important plant-based agricultural institutions in India (ICAR institutes, SAUs, and related bodies).

Right now the team is:

  1. Checking which institutions publish annual reports online
  2. Downloading reports where possible (PDF preferred; HTML index pages where useful)
  3. Recording problems as GitHub Issues — timeouts, 403 blocks, broken links, missing reports, mirror URLs, etc.

Progress so far: 6 of the first 10 institutions attempted have reports in the repo. See india_plant_reports/README.md for the live status table.


Repository layout

Path Purpose
docs/ Reference lists, session logs, methodology notes
docs/india_plant_agricultural_institutions.md Master list of institutions (78 unique)
india_plant_reports/ Downloaded annual reports — india_plant_reports/<institution>/report/
ingest/ Prototype pipeline for parsing and phrase indexing (experimental)
docs/sessions/ Dated session notes

How to contribute

New to the project? Start with GETTING_STARTED.md.

Collecting reports:

  1. Pick an institution from the institution list that does not yet have a folder under india_plant_reports/.
  2. Find the latest annual report on the institute website.
  3. Create india_plant_reports/<abbrev>/report/ and add the PDF (and index HTML if available).
  4. Commit on your personal branch and push — see joining instructions.

When something fails, open a GitHub Issue with:

  • Institution name and abbreviation
  • URL you tried
  • What happened (timeout, 403, no report found, corrupt PDF, etc.)
  • Any alternate URL that worked (mirror domain, direct PDF link)

The project lead merges branches into main; you do not need to merge yourself.


Reporting problems

Use GitHub Issues for anything that blocks or complicates collection:

Issue type Include
Site unreachable Domain, error (timeout / DNS / SSL)
Access denied URL, HTTP status (e.g. 403)
Report not found Institution page URL, what you searched
Wrong or corrupt file Filename, file size, how you obtained it
Mirror found Primary URL (failed) + working mirror URL

Label issues if available (e.g. collection, blocked). When in doubt, open the issue anyway.


Links


Licence

See repository licence file when added. UN/institute report PDFs remain the property of their respective publishers; we store copies for research and analysis purposes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors