-
Notifications
You must be signed in to change notification settings - Fork 3
Performance comparison between TPE multivariate and CMAES
The goal is to find which sampler is better tpe multivariate or cmaes, when the objective is to find the best values of knights and rooks using Deuterium engine. The objective value is calculated from the result of engine vs engine match at 200 games with a time control of 2s+50ms. This study has 100 trials, once the two studies are completed the resulting optimized parameter values from both samplers will be tested against the default parameter values of Deuterium.
KnightValueOp = 325
KnightValueEn = 315
RookValueOp = 493
RookValueEn = 525
Op stands for Opening while En stands for Ending.
The calculation of performance as verification of the optimized parameter values will be done by a round-robin tournament among the default, tpe multivariate and that of cmaes. Each pair will be a match of 2000 games at a time control of 10s+50ms.
Optuna version that will be used in this optimization is 2.6.0.
Rook values are close to 500, in RookValueEn the max is 700, this is a test of how the samplers would handle such a wider range since both are limited to only 100 trials.
--input-param "{'KnightValueOp': {'default':225, 'min':200, 'max':400, 'step':1}, 'KnightValueEn': {'default':215, 'min':200, 'max':400, 'step':1}, 'RookValueOp': {'default':400, 'min':300, 'max':600, 'step':1}, 'RookValueEn': {'default':625, 'min':400, 'max':700, 'step':1}}"
This optimization requires that the best parameter values has to beat the old best parameter values by more than 55%.
--initial-best-value=0.55
That means in a match of 200 games the best parameter should score more than 110 points. This high margin of winning increases the probability that the parameter output after the optimization is truly the best.
set study_name=tpe_multivariate
set engine_file=./engines/deuterium/deuterium.exe
set threshold_pruner_result=0.35
python tuner.py --study-name %study_name% --sampler name=tpe multivariate=true ^
--initial-best-value=0.55 ^
--engine %engine_file% --common-param "{'Hash': 128}" ^
--concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --opening-format epd ^
--input-param "{'KnightValueOp': {'default':225, 'min':200, 'max':400, 'step':1}, 'KnightValueEn': {'default':215, 'min':200, 'max':400, 'step':1}, 'RookValueOp': {'default':400, 'min':300, 'max':600, 'step':1}, 'RookValueEn': {'default':625, 'min':400, 'max':700, 'step':1}}" ^
--games-per-trial 200 --trials 100 ^
--base-time-sec 2 --inc-time-sec 0.05 ^
--pgn-output %study_name%.pgn ^
--threshold-pruner result=%threshold_pruner_result%
set study_name=cmaes
set engine_file=./engines/deuterium/deuterium.exe
set threshold_pruner_result=0.35
python tuner.py --study-name %study_name% --sampler name=cmaes ^
--initial-best-value=0.55 ^
--engine %engine_file% --common-param "{'Hash': 128}" ^
--concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --opening-format epd ^
--input-param "{'KnightValueOp': {'default':225, 'min':200, 'max':400, 'step':1}, 'KnightValueEn': {'default':215, 'min':200, 'max':400, 'step':1}, 'RookValueOp': {'default':400, 'min':300, 'max':600, 'step':1}, 'RookValueEn': {'default':625, 'min':400, 'max':700, 'step':1}}" ^
--games-per-trial 200 --trials 100 ^
--base-time-sec 2 --inc-time-sec 0.05 ^
--pgn-output %study_name%.pgn ^
--threshold-pruner result=%threshold_pruner_result%
The best trial found was at 88, that means succeeding trials such as 89 to 100 could not beat trial 88 by more than 55% score from engine vs engine match.
study best param: {'KnightValueEn': 293, 'KnightValueOp': 329, 'RookValueEn': 526, 'RookValueOp': 543}
study best value: 0.5511718750000001
study best trial number: 88
study best param: {'KnightValueEn': 337, 'KnightValueOp': 329, 'RookValueEn': 508, 'RookValueOp': 496}
study best value: 0.5511718750000001
study best trial number: 7
format: round-robin
games per pair: 2000
start pgn: mabigat.pgn
each start position is played twice, side reversed: Yes
time control: 10s+50ms
tournament manager: cutechess-cli
cmaes is better with a score of 2008/4000, tpe multivariate scored 1970/4000. In terms of rating cmaes leads by +4.5, not a statistically significant as this lead is still within the error margin of +/- 9 at 95% confidence level.
In the head to head encounter cmaes won over tpe multivariate with a stats of 2000 ( 519, 1010, 471)
, that is 2000 games, 519 wins, 1010 draws and 471 loses.
Both optimized parameters are still behind the default values but default lead is also not statistically significant as the error of +/-9 is still larger that its lead.
Summary:
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 default : 0.0 ---- 2022.0 4000 51
2 cmaes : -1.6 8.7 2008.0 4000 50
3 tpemultivariate : -6.1 9.1 1970.0 4000 49
Head to head statistics:
1) default 0.0 : 4000 (+997,=2050,-953), 50.5 %
vs. : games ( +, =, -), (%) : Diff, SD, CFS (%)
cmaes : 2000 ( 506, 1020, 474), 50.8 : +1.6, 4.5, 64.4
tpemultivariate : 2000 ( 491, 1030, 479), 50.3 : +6.1, 4.7, 90.5
2) cmaes -1.6 : 4000 (+993,=2030,-977), 50.2 %
vs. : games ( +, =, -), (%) : Diff, SD, CFS (%)
default : 2000 ( 474, 1020, 506), 49.2 : -1.6, 4.5, 35.6
tpemultivariate : 2000 ( 519, 1010, 471), 51.2 : +4.5, 4.5, 83.9
3) tpemultivariate -6.1 : 4000 (+950,=2040,-1010), 49.3 %
vs. : games ( +, =, -), (%) : Diff, SD, CFS (%)
default : 2000 ( 479, 1030, 491), 49.7 : -6.1, 4.7, 9.5
cmaes : 2000 ( 471, 1010, 519), 48.8 : -4.5, 4.5,
One reason why the samplers could not beat the default is that the calculation of objective value is only 200 games, whereas in our game verification tests it is 4000 games. Surely there can be some positions in game verification that are not represented during objective measurement. Also the default values had been tested to more than 10k games during its development, it would be difficult for it be defeated easily.
The samplers are not really doing bad, they managed to perform close to the default even if they are given a wider range of search space.
--input-param "{'KnightValueOp': {'default':225, 'min':200, 'max':400, 'step':1}, 'KnightValueEn': {'default':215, 'min':200, 'max':400, 'step':1}, 'RookValueOp': {'default':400, 'min':300, 'max':600, 'step':1}, 'RookValueEn': {'default':625, 'min':400, 'max':700, 'step':1}}"