SPEAR is a library for data programming with semi-supervision. The package implements several recent data programming approaches including facility to programmatically label and build training data.
- development of LFs/rules/heuristics for quick labeling
- incorporation of automatically generated LFs
- compare against several data programming approaches
- compare against semi-supervised data programming approaches
- use subset selection to make best use of the annotation efforts
SPEAR requires Python 3.6 or later. First install submodlib. Then to install SPEAR:
git clone https://github.com/decile-team/spear.git
cd spear
pip install -r requirements/requirements.txt
data folder for SMS can be found here. This folder needs to be placed in the same directory as notebooks folder is in, to run the notebooks or examples.
- discrete LFs - Users can define LFs that return discrete labels
- continuous LFs - return continuous scores/confidence to the labels assigned
You can read this paper to know about below approaches
- Only-L
- Learning to Reweight
- Posterior Regularization
- Imply Loss
- CAGE
- Joint Learning
- SPEAR tutorials
- SPEAR documentation
- SMS SPAM: CAGE colab, JL colab
- DECILE website
- SubModLib - Summarize massive datasets using submodular optimization
- DISTIL- Deep Diversified Interactive Learning
- CORDS- COResets and Data Subset Selection
SPEAR takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include Snorkel, Snuba & Imply Loss. Also, SPEAR uses SUBMODLIB for subset selection, which is provided by DECILE too.
SPEAR is created and maintained by Ayush, Abhishek, Vineeth, Harshad, Parth, Pankaj, Rishabh Iyer, and Ganesh Ramakrishnan. We look forward to have SPEAR more community driven. Please use it and contribute to it for your research, and feel free to use it for your commercial projects. We will add the major contributors here.
[1] Maheshwari, Ayush, et al. Data Programming using Semi-Supervision and Subset Selection, In Findings of ACL (Long Paper) 2021.
[2] Chatterjee, Oishik, Ganesh Ramakrishnan, and Sunita Sarawagi. Data Programming using Continuous and Quality-Guided Labeling Functions, In AAAI 2020.
[3] Sahay, Atul, et al. Rule augmented unsupervised constituency parsing, In Findings of ACL (Short Paper) 2021.
