bots

🤖🤖🤖 BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

Overview

BOTS is a unified framework for Bayesian Online Task Selection in LLM reinforcement finetuning.

BOTS operates in a continuous loop of task selection, model training, and posterior updating. (1) Selection: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$). (2) Training & Evidence Collection: The LLM is finetuned, yielding direct success/failure counts (explicit evidence) for the selected batch. For unselected tasks, predicted counts (implicit evidence) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead. (3) Posterior Updating: Explicit and implicit evidence are fused using our generalized Bayesian update rule.

Usage

Step 1: Environment Preparation

Ensure Trinity-RFT is well installed (Installation Guide). No extra dependence is required.

Step 2: Model & Dataset Preparation

Download the model your want to train (e.g., Qwen2.5-1.5B-Instruct).

Download the GURU dataset. Also refer to the Data Preparation Guide and the Tech Report provided by the LLM360 team.

Remember to modify the model/data path in bots.yaml and random.yaml accordingly.

(Optional) Customize Reference Evaluation Results

Modify ref_eval_collect.yaml to set the reference model you want to evaluate, e.g., Qwen2.5-1.5B-Instruct.

Launch evaluation by executing:

BOTS_REF_EVAL_LOG_FILE="path/to/save/eval/logs" trinity run --config examples/bots/ref_eval_collect.yaml --plugin-dir examples/bots/workflow

The evaluation logs will be saved at the specified location. Then integrate the evaluation results as a new column into the original dataset:

python examples/bots/ref_eval_collect.py \
--data-path <your/path/to/original/dataset> \
--ref-eval-path <your/path/to/bots_ref_eval_log.jsonl> \
--ref-eval-key <column name, e.g., qwen2.5_1.5b_pass_rate>

Remember to update task_selector.feature_keys in bots.yaml.

Step 3: Training

Launch training by executing:

trinity run --config examples/bots/bots.yaml

The improvement over random selection baseline can be stably obtained 🤖🤖🤖.

Complete Reproduction

For complete reproduction of the results in our paper, please use the verl version implementation available here.

Citation

If you find the repo helpful, please cite:

@misc{TrinityRFT,
      title={Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models},
      author={Xuchen Pan and Yanxi Chen and Yushuo Chen and Yuchang Sun and Daoyuan Chen and Wenhao Zhang and Yuexiang Xie and Yilun Huang and Yilei Zhang and Dawei Gao and Weijie Shi and Yaliang Li and Bolin Ding and Jingren Zhou},
      year={2025},
      eprint={2505.17826},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.17826},
}

@misc{BOTS,
      title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning},
      author={Qianli Shen and Daoyuan Chen and Yilun Huang and Zhenqing Ling and Yaliang Li and Bolin Ding and Jingren Zhou},
      year={2025},
      eprint={2510.26374},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.26374},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

🤖🤖🤖 BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

Overview

Usage

Step 1: Environment Preparation

Step 2: Model & Dataset Preparation

(Optional) Customize Reference Evaluation Results

Step 3: Training

Complete Reproduction

Citation

Name		Name	Last commit message	Last commit date
parent directory ..
workflow		workflow
README.md		README.md
README_zh.md		README_zh.md
bots.yaml		bots.yaml
random.yaml		random.yaml
ref_eval_collect.py		ref_eval_collect.py
ref_eval_collect.yaml		ref_eval_collect.yaml

FilesExpand file tree

bots

Directory actions

More options

Directory actions

More options

Latest commit

History

bots

Folders and files

parent directory

README.md

🤖🤖🤖 BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning

Overview

Usage

Step 1: Environment Preparation

Step 2: Model & Dataset Preparation

(Optional) Customize Reference Evaluation Results

Step 3: Training

Complete Reproduction

Citation