OpenClaw-RL: Train any agent simply by talking
-
Updated
Mar 27, 2026 - Python
OpenClaw-RL: Train any agent simply by talking
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimodal, and on-policy self-distillation.
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation of large language models
🛠️ Apply on-policy distillation to enhance Qwen3-0.6b's performance on GSM8K by learning from its own outputs, reducing bias during inference.
Train and customize OpenClaw agents using reinforcement learning with simple language feedback and fully asynchronous optimization.
Add a description, image, and links to the on-policy-distillation topic page so that developers can more easily learn about it.
To associate your repository with the on-policy-distillation topic, visit your repo's landing page and select "manage topics."