This is the codebase for our paper "The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning " by M.Igl, G. Farquhar, J. Luketina, W. Boehmer and S. Whiteson.
It also includes an implementation of IBAC-SNI on ProcGen.
It comprises several sub-folders:
gym-minigridcontains the grid-world environment (for the Multiroom experiments) and is adapted from https://github.com/maximecb/gym-minigrid This environment is used together withtorch_rltorch_rlcontains the agent to run on thegym-minigridenvironment and is adapted from https://github.com/lcswillems/rl-starter-filesmultiroom_expscontains the training code for the Multiroom experiments.train-procgencontains the code for the results on the ProcGen domain. It is code adapted from https://github.com/openai/train-procgencifarcontains the code for the supervised experiments. It is code adapted from https://github.com/kuangliu/pytorch-cifar
Plotting is explain at the very end.
All experiments can be run in the accompanying docker container. To build it, call
./build.sh
in the root folder.
Then, an interactive docker session can be started with
./runi.sh <GPU-ID> <Containername>
./runi.sh 0 iterwhere <GPU-ID> is the GPU you want to use and <Containername> can be anything or left empty.
After starting interactive session (./runi.sh in root folder), move to cifar folder:
cd cifarAnnealing the fraction of correct datapoints from 0 to 1
Run the baseline:
python main.py -pRun with non-stationarity:
python main.py -p with annealing.every_n_epochs=1 annealing.type=<type>where <type> can either be size (=Dataset size), random (=Noisy labels) or consistent (=Wrong labels).
Results for self-distillation
For the baseline (i.e. no non-stationarity)
python main.py -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 And for non-stationarities:
python main.py -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 annealing.type=<type>where again, please fill in type <type> as desired.
Two phase training
python main.py -p with epochs=1500 annealing.duration=700 frozen_test_epochs=800 annealing.type=<type> annealing.start_fraction=<fraction>where <type> and <fraction> should be filled out as desired.
In the experiments, we used the following values for <fraction>.
For Wrong labels and Noisy Labels: 0.05, 0.1 0.2, 0.3, 0.4, 0.5, 0.75, 1.0
Additionally For Dataset Size: 0.005, 0.01, 0.02 + same as for the others
Preparation
- Start the interactive docker session:
runi.shin root folder - Install
gym-minigrid:pip install -e gym-minigrid - Install
torch_rl:pip install -e torch_rl - Move to
multiroom_exps:cd multiroom_exps
Running commands
python train.py -p with iter_type=<type>where type can be either none (small caps!) or distill
For ProcGen we need 4 GPUs at once, so we need to start the interactive docker container as
./runi.sh 0,1,2,3 procgenfor GPUs 0,1,2,3.
Then go to subfolder cd train-procgen/train_procgen/
Baseline PPO:
mpiexec -np 4 python train.py -p with env_name=<env_name>where <env_name> can be any of the ProcGen environments.
The ones used in this paper were starpilot, dodgeball, climber, ninja and bigfish.
PPO+ITER:
mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=TrueBaseline IBAC:
mpiexec -np 4 python train.py -p with env_name=<env_name> arch.reg=ibacwhere we use selective noise injection as well.
IBAC+ITER:
mpiexec -np 4 python train.py -p with env_name=<env_name> arch.reg=ibac iter_loss.use=TrueSequential ITER
mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True \
iter_loss.v2=True \
iter_loss.timesteps_initial=71_000_000 \
iter_loss.timesteps_anneal=9_000_000 \
iter_loss.timesteps_free=71_000_000Careful: This will generate about 500GB of data!
ITER without RL terms in distillation
mpiexec -np 4 python train.py -p with env_name=<env_name> iter_loss.use=True \
iter_loss.alpha_reg.schedule=const \
iter_loss.use_burnin_rl_loss=False The experiments are using sacred for configuration and logging.
For more thorough use of this codebase, I'd recommend setting up a MongoDB to store the results.
At the moment, results are logged using the FileStorageObserver into a db folder in the root directory.
There is a very simple plotting script included: plot.py:
python plot.py --id <id> --metric <metric>where <id> is the unique experiment id assigned to each run by sacred.
It is printed in stdout somewhere at the beginning when starting a new run.
<metric> is the name of what you want to plot.
This is train_acc and test_acc for the supervised experiments, rreturn_mean for Multiroom and eprewmean for ProcGen.
Many more things are also being logged, either check the code or metrics.json file to see what.
Special for Procgen: The ProcGen experiments run 4 threads, three for training, one for testing. Each of those 4 threads gets one unique id, but only two of those threads are acutally logging something, one the training, one the test performance. Just try plotting for each of those 4 ids, it will either crash (if that id wasn't logging) or the plotting script will actually print out whether that's the train or test performance.