Benchmark SRBench
Data PMLB
SRBench test 120 black-box and 133 ground-truth datasets(119 feynman + 14 strogatz). Every dataset is tested for 10 random seeds, ground-truth datasets are testes for 4 values of noise[0, 0.001, 0.01, 0.1]. That's together (120+133*4)*10 = 6520 runs of symbolic regressor.
On local computer this script srbench.sh (requires miniconda) Run time is reduced to 1 second per sample. Total 6520s / number of used cores + benchmark overhead.
With github workflow sr_bench.yml Results are stored in bbox_result and gt_result artifacts. There are 4 choises HROCH_1s, HROCH_10s, HROCH_1m, HROCH_5m with defined time per sample to 1s, 10s, 1m, 5m. Total run time from 47 min to 17 hours.
How often a method finds a model symbolically equivalent to the ground-truth process
How often a method finds a model with test set R2>0.999
Considering the accuracy and simplicity of models simultaneously, this figure illustrates the trade-offs made by each method. Methods lower and to the left produce models with better trade-offs between accuracy and simplicity.