Negative Correlated Natural Evolution Strategy

This code contains PyTorch implementations of Deep Reinforcement Learning with 'Negative Correlated Natural Evolution Strategy'(NCNES) algorithms.

Dependency

gym 0.12.1
pytorch1.0
python 3.6

All Dependency can be imported by anaconda environment with requirement

Usage

python main.py [--game][--ncpu][--lam][--mu][--parallel][--lr_mean][--lr_sigma][--phi][--sigma_init][--eva]
[game]     Freeway,Enduro,Qbert,Alien..., default = Freeways
[ncpu]     numbers of cpu. default = 40
[lam]      numbers of population size. default = 5 
[mu]       numbers of offsprings in a population. default = 15
[parallel] parallel mode, default = p(parallel),option = s(serial), i(individual)
[lr_mean]  learning rate of mean. default= 0.2 
[lr_sigma] laerning rate of sigma. default = 0.1 
[phi]      negative correlated search factor.default= 0.0001 
[sigma_init] initialization value of sigma. default= 2  
[eva]      max evaluate times.defalut = 3

File Tree

├── readme.md // help
├── log // logmodel
├── src
│ ├──__init__.py // init file
│ ├── model.py // class of neural network (model)
│ ├── optimizer.py // optimize and update function
│ ├── preprocess.py // class of preprocess transform
│ ├── train.py // train and test function
│ ├── util.py // other function
│ └── vbn.py // class and function about vitural batch
├── environment.yml // dependenct Installation file └── main_all.py // run

Results

We sampled more than 20 points during training and draw training curves as following figure shows.

The experiment repeated for 3 times and scores are shown in Table 1.

Game	Score1	Score2	Score3
BeamRider	856.8	620.4	719.3
Freeway	22.7	21.1	22.1
Enduro	29.8	8.7	11.5

Update log

Fixed noop frame Add function to view and change the 30-no-ops frame setting and every no op setting corresponding to episode will be logged.
Modified weight update Weight update by named_parameters and params.data rather than tmp = getsttr(tmp).
Modified random seed Random seed are not fixed. Env seed, np.random, torch seed use time.time().
Delete SGD optimizer Delete SGD optimier and update directly.
Add build_mean() To build gaussian distribution dictionary and initialize mean of Gaussian as mean= L + (H-L) *rand
Detele mirror sample Delete mirror sample noise.
Modified ARGS class Add set folderpath and checkpointname
Modified sample noise In get reward atari() ,noise are sampled directly rather than noise table and saved. Delete noise table.
Modified optimize Diversity,fitness,fisher are incoporated into optimize.
Modified logger Output logging in a more readable way.
Add main_serial Run main in single process.
Intergrate main_all Run all parallel mode in one file

Reference

[1] Yang, P., Yang, Q., Tang, K. et al. Parallel exploration via negatively correlated search. Front. Comput. Sci. 15, 155333 (2021). https://doi.org/10.1007/s11704-020-0431-0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
img		img
script		script
src		src
test		test
main.py		main.py
main_parallel.py		main_parallel.py
main_serial.py		main_serial.py
readme.md		readme.md
requirement.yaml		requirement.yaml
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Negative Correlated Natural Evolution Strategy

Dependency

Usage

File Tree

Results

Update log

Reference

About

Releases

Packages

Languages

Desein-Yang/NCNES

Folders and files

Latest commit

History

Repository files navigation

Negative Correlated Natural Evolution Strategy

Dependency

Usage

File Tree

Results

Update log

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages