This code contains PyTorch implementations of Deep Reinforcement Learning with 'Negative Correlated Natural Evolution Strategy'(NCNES) algorithms.
- gym 0.12.1
- pytorch1.0
- python 3.6
All Dependency can be imported by anaconda environment with requirement
python main.py [--game][--ncpu][--lam][--mu][--parallel][--lr_mean][--lr_sigma][--phi][--sigma_init][--eva]
[game] Freeway,Enduro,Qbert,Alien..., default = Freeways
[ncpu] numbers of cpu. default = 40
[lam] numbers of population size. default = 5
[mu] numbers of offsprings in a population. default = 15
[parallel] parallel mode, default = p(parallel),option = s(serial), i(individual)
[lr_mean] learning rate of mean. default= 0.2
[lr_sigma] laerning rate of sigma. default = 0.1
[phi] negative correlated search factor.default= 0.0001
[sigma_init] initialization value of sigma. default= 2
[eva] max evaluate times.defalut = 3
├── readme.md // help
├── log // logmodel
├── src
│ ├──__init__.py // init file
│ ├── model.py // class of neural network (model)
│ ├── optimizer.py // optimize and update function
│ ├── preprocess.py // class of preprocess transform
│ ├── train.py // train and test function
│ ├── util.py // other function
│ └── vbn.py // class and function about vitural batch
├── environment.yml // dependenct Installation file
└── main_all.py // run
We sampled more than 20 points during training and draw training curves as following figure shows.
The experiment repeated for 3 times and scores are shown in Table 1.
Game | Score1 | Score2 | Score3 |
---|---|---|---|
BeamRider | 856.8 | 620.4 | 719.3 |
Freeway | 22.7 | 21.1 | 22.1 |
Enduro | 29.8 | 8.7 | 11.5 |
- Fixed noop frame Add function to view and change the 30-no-ops frame setting and every no op setting corresponding to episode will be logged.
- Modified weight update Weight update by
named_parameters
andparams.data
rather thantmp = getsttr(tmp)
. - Modified random seed Random seed are not fixed. Env seed, np.random, torch seed use
time.time()
. - Delete SGD optimizer Delete SGD optimier and update directly.
- Add
build_mean()
To build gaussian distribution dictionary and initialize mean of Gaussian asmean= L + (H-L) *rand
- Detele mirror sample Delete mirror sample noise.
- Modified
ARGS
class Add set folderpath and checkpointname - Modified sample noise In
get reward atari()
,noise are sampled directly rather than noise table and saved. Delete noise table. - Modified
optimize
Diversity,fitness,fisher are incoporated intooptimize
. - Modified
logger
Output logging in a more readable way. - Add
main_serial
Run main in single process. - Intergrate
main_all
Run all parallel mode in one file
[1] Yang, P., Yang, Q., Tang, K. et al. Parallel exploration via negatively correlated search. Front. Comput. Sci. 15, 155333 (2021). https://doi.org/10.1007/s11704-020-0431-0