Learning optimal mutation weights #36

gAldeia · 2023-05-19T22:33:24Z

gAldeia
May 19, 2023
Collaborator

Learning optimal mutation weights

TL;DR: Brush has 4 types of mutations, and I'm implementing learners to optimize which mutation to use during the evolution. I have implemented two different methods in two notebooks (Dynamic Thompson Sampling and Dynamic Multi-Armed Bandit). The important stuff in these notebooks are the statistic table and the mutation choices over iterations (example below). There is one experiment for classification and one for regression.

Brief description of the problem

I'm prototyping methods to learn the mutation weights during evolution.

After our last meeting, I was going to run these modifications with the PMLB adults dataset and create a discussion here to share my concerns, but my kernel was dying, and it took me a while to fix that. At least finding and fixing it made me wiser.

Brush is designed to have weights used in sampling steps. This feature exists in a variety of places: operators and terminals in the search space, nodes in program trees, and in different types of mutation. The idea is that, during evolution, we learn which options give better rewards than others, and increase their weights (aka probabilities) of being sampled. In the GA literature, this is called adaptive pursuit, and it is similar to the Multi-Armed Bandit (and the Explore vs. Exploit (EvE) problem).

An improvement to this weights learning is to dynamically change these weights as the context changes (during the evolution, the characteristics of the population will change, and we should respond to it). This is what I am trying to achieve: prototyping an evolutionary algorithm that uses some logic to increase the number of choices that seem to have a higher average reward, and decrease those that don't (always balancing between exploration and exploitation).

What I have done

Right now I am testing only with mutation weights since I'm prototyping in Python and these methods need to know exactly what is going on (and the C++ interface is more convenient to do that with mutation).

We have 4 different mutations (point, insert, delete, and toggle weight), which produce different results. I've implemented two methods.

I managed to get it working in two notebooks:

The structure of these two notebooks is as follows:

Brief description of the method (texts inside > blockquotes are personal thoughts);
a sanity check with 4 arms to see the number of times each arm was chosen;
implementation of a modified version of Brush's DEAP GA;
with one regression toy dataset and the PMLB adult dataset:
- 30 executions of the original algorithm and the modified version;
- descriptive statistics of the executions;
- plot showing the mutation choices over iterations.

Next steps

I am still checking for bugs in these modified versions. I feel that I need to get a robust empirical result that these methods work with the mutation options before moving into learning weights for other stuff (because it would require both writing more interfaces in the Python wrapper, and the problem would become much harder, as we would have more "arms").

Another problem that I have is: how do I use these specific algorithms to change Brush weights? Brush is using the weights with uniform distribution sampling, and each of the methods above uses different approaches (the former uses a prior Beta distribution, and the latter makes deterministic choices). I need to think about how to make this implementation interoperable with how Brush is structured. This solution must be transparent to the user, in a way that the weights can be interpreted as they already are.

Finally, the last problem (which I think we should address later) is the design of the implementation of these learners in C++.

gAldeia · 2023-05-29T00:37:06Z

gAldeia
May 29, 2023
Collaborator Author

Improving the plots and descriptive statistics

Commit 3920413 tries to help us get insights about the mutation learners usability by keeping track of their choices, the values of its inner parameters, and the metrics of the population.

The structure of the study notebooks is still the same, as are the experiments and datasets. Additionally to what I've already mentioned in the discussion, we have:

Add statistics for the size and depth of the expressions in the results dataframe;
Plot the values of alphas and betas (or the UCB1 scores) after each update;
Plot the avg., min. std. metrics (R2/Acc. and expression size) over generations;
Normalized percentage of choice of mutation for every generation (instead of plotting each one individually);

New tables looks like this:

And new plots looks like these:

Exploring when to update the learner

Additionally, I've changed how the learners are updated (just to explore if we should update after each mutation or only at the end of the generation). Now we have a list that keeps track of every choice the learner makes (as well as its reward), but these values are used to update the learner only when the size of the batch reaches a value of batch_size. If we set batch_size=1 then the learner is updated after each mutation. If we set it to batch_size=self.pop_size, then the update is made at the end of the generation.

Bugs

I've noticed a problem with binary columns in the dataset (commit d35edcd creates a test that fails due to a core dump, mentioned in issue #37).

Design of implementation in C++

I am working on making mutation and crossover returns std::optional. The way that I've redesigned the mutation gives us all information that I needed when implementing the learners in the Python end. This branch is where I am currently working on this feature, and I'm starting to run the same experiments comparing this new mutation schema and the previous one.

Next steps

I hope this new information helps us determine whether these learners are working and actually improving their results.

I'll finish working on the std::optional, the design of learners in C++, and fix the bug. I need to get rid of toy datasets and try making these experiments with something more difficult than what I am currently using. However, I believe this should be done only after fixing the known bugs.

0 replies

gAldeia · 2023-06-05T19:33:52Z

gAldeia
Jun 5, 2023
Collaborator Author

Preparing to start small experiments

I finally made mutation and crossover return std::optional, fixed the bugs that I found, got rid of toy datasets, and tried a little bit harder to get better plots.

I added new statistics (number of positive and null rewards for each arm pull, success rates, and execution time for the algorithms).

Final plots for the weight optimizer now looks like this:

Now I am starting to run these modified versions of Brush with more complex datasets (PMLB's houses for regression and adults for classification task) and a small pop size and number of generations. After that, I hope I can determine whether the learners are improving the algorithm, and finally move to the only 2 remaining items:

How to convert the learners' parameters into Brush's original weight;
Implementation design of these learners in C++ (so we can move on to using them in another places, like node weights or mutaiton point weights).

0 replies

gAldeia · 2023-06-12T14:59:08Z

gAldeia
Jun 12, 2023
Collaborator Author

Changing how we calculate the size

Before starting, I want to share that I created a more sophisticated experimental setup:

one regression and one classification PMLB dataset;

30 executions for each dataset;

reporting execution time, size, depth, score, for each variation of brush (no learner, TS learner, MAB learner)

boxplots and statistical tests reported after executing the experiments.

EDIT: I tried running the experiments setting is_weighted to true or false by default to see if toggle_weight mutation would be more used.

We are calculating the size of the programs without taking into account the weights in the nodes. I made a change in the cpp code locally and tested how the results would look like with and without this modification.

Without taking into account weights when calculating program size

Taking into account weights when calculating program size with `is_weighted=true` by default.

Taking into account weights when calculating program size with `is_weighted=false` by default.

0 replies

gAldeia · 2023-06-29T15:32:19Z

gAldeia
Jun 29, 2023
Collaborator Author

Learners are not showing improvements

The results above were obtained at the first experiment with two dataset. Although I've been changing several aspects in the code, I couldn't get something better than these. And I'm using 6 datasets of different dimensions to better validate the implementations.

I am starting to think that --- since the underlying evolutionary algorithm is the NSGA-II (which is interesting for us, since we want both accurate and interpretable models) --- we need to use multi-objective learners as well. There are some works on modifying the UCB1 algorithm to work with multi-objectives (Designing multi-objective multi-armed bandits algorithms: A study), and it seems that I should go to this direction.

I'm also wondering: do we need to have dynamic bandits, or can we get rid of dynamically changing the learned distributions by using the context of the expressions? In this line, I found a contextual multi-objective MAB (Multi-Objective contextual bandits with a dominant objective).

For now I'll be still trying to improve the implementations and experiments, but I'm going to change what we are using as learners, probably by implementing these two algorithms.

Baseline in experiments

Last but not least, I've made two changes in the experiment settings:

Implementation of NSGA-II without using Brush's back-end;
dummy learner, that uses the same weights as brush, but allows us to keep track of choices and rewards.

0 replies

gAldeia · 2023-07-17T16:40:35Z

gAldeia
Jul 17, 2023
Collaborator Author

Incorporating context on the learners (and fixes)

This time, I've been working on making sure everything is implemented correctly, and adding pareto and context into the learners. now I think we got a complete environment to explore how brush can benefit from learners. Hopefully, the next step is obtaining final results and start exploring it.

I've implemented the algorithm from Designing multi-objective multi-armed bandits algorithms: A study. Now we have a pareto learner.

Tons of changes and bug fixes!

Context Learners

I also searched how to do a contextual learner. I found 3 different main approaches in the literature:

Learning a linear model to describe the relationship between the mean reward of an arm and its context (i.e. Robust Contextual Linear Bandits);
Using NN to learn to map context to arms's rewards (i.e. Neural Contextual Bandits with UCB-based Exploration);
Dividing the context space into sub-spaces, each one having it's own learner (i.e. Contextual Bandits with Similarity Informatio).

The last one draw my attention due to the fact that we have multiple independent learners, instead of a single one. This means we can learn complex policies --- more general than a linear model, but without the burden of using a NN ---, at the cost of having to deal with managing the partitions. There are some approaches (zooming, smoothing, adapting) to do that.

The authors proposed a contextual learner that takes as argument any context-free learner. I've picked this one to implement because --- with a single implementation --- I can double the number of learners in the experiment. Since I have learners based on UCB1s, uniform distributions, Thompson Sampling, and Pareto front, this allows us to see how context can improve them.

The idea of this algorithm is to manage sub-spaces, zooming into contexts, using the idea of balls and finite metric spaces (Still need to debug and make sure everything is correctly implemented!). Another interesting thing is that the balls can be deactivated (then, a new one is created to fill its place), and the learners are associated with the balls. This works like a reset into sub-regions, with was demonstrated in the paper that can handle with some drifting into the distribution of the arms.

Experiment changes

50 runs instead of 30
Added more datasets
Pareto plots (needs to be improved)

Changes (both brush and brush-weights):

now i can retrieve with PARAMS['mutation_trace'] (required no changes in python wrapper). should I do documentation for params mutation trace?
if mutation fails then the learner gets updated with a null reward (it was transparent to them before, they were only updated when mutation worked entirely)
NSGAII now creates floating ERC, not int
No_Learner has no active actuation in mutation (it does not select anything, like dummy learner), but it keeps track of what happens in the mutations
train and test partitions! (sorry about that)
stopped doing batch size (set to 1. the feature still exists)
node init: switched order to fix and change prob (solved fixes 4 and 5)
changed default values for D_UCB parameters (based on the paper). This changed drastically the results.
pareto_UCB now has a reward vector to update each dimension individually (and stop having negative rewards for all arms, behaving like dummy learner)
check for available spots before doing mutation (fix bug 10 and note 1)
save number of successes of the mutation and report it in the learners' statistics (plots could be better, but for now it is useful for understanding our implementations)
learner uses the mutation trace to make sure it could force a mutation, and raise an assertion error otherwise (this makes more secure to trust that brush is actually running in parallel without colateral effects)
rename no_learner to something like Listener. make names more intuitive
check for available spots in crossover (to avoid selecting spots even with all negative weights, like mutation was doing)
field called status in PARAMS mutation trace to see where it was before leaving function the
implemented contextual wrapper
log scale on plots (pareto and MSE related)

Bug fixes

run 2 different brush experiments (even not in parallel)
debug D_UCB - avg rewards is not being updated
fix pareto ucb error-uncertainty value
why some spots has weight 0? (normally x_0 and constants)
if prob is 1e-41, but not zero, the node still can be selected. it seems mutation is happening where it should not
differences between no_learner, dummy_learner, brush mod and original brush
sometimes No_Learner needs to be executed isolated from others
doing experiments sequentially is better than randomly (don't know why). do linear experiments (instead of parallel) show different results.
(Fixed after 11 & 12) only D_UCB1 and D_TS are showing dynamic behavior (or at least a change in usage of mutations). What is going on with D_UCB and Pareto_UCB?
select random can select an option even if it is negative. should i go with this? (maybe weights could be as low as -10000 (big negative number) in this scenario)
why avg reward is always 0 or 1 (and never a different value) in D_UCB? (Is it resetting everytime?)
Pareto_UCB: Reward is negative after a while, should never happen
cx prob was not being used in original brush (but was used in brush mod). maybe cx explains part of the better performance observed in dummy?
original brush had a varying max_size on make_individual (feature I was testing and forgot to remove). this explain the rest of dummy having best performance

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning optimal mutation weights #36

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Learning optimal mutation weights #36

gAldeia May 19, 2023 Collaborator

Learning optimal mutation weights

Brief description of the problem

What I have done

Next steps

Replies: 5 comments

gAldeia May 29, 2023 Collaborator Author

Improving the plots and descriptive statistics

Exploring when to update the learner

Bugs

Design of implementation in C++

Next steps

gAldeia Jun 5, 2023 Collaborator Author

Preparing to start small experiments

gAldeia Jun 12, 2023 Collaborator Author

Changing how we calculate the size

Without taking into account weights when calculating program size

Taking into account weights when calculating program size with is_weighted=true by default.

Taking into account weights when calculating program size with is_weighted=false by default.

gAldeia Jun 29, 2023 Collaborator Author

Learners are not showing improvements

Baseline in experiments

gAldeia Jul 17, 2023 Collaborator Author

Incorporating context on the learners (and fixes)

Context Learners

Experiment changes

Changes (both brush and brush-weights):

Bug fixes

gAldeia
May 19, 2023
Collaborator

gAldeia
May 29, 2023
Collaborator Author

gAldeia
Jun 5, 2023
Collaborator Author

gAldeia
Jun 12, 2023
Collaborator Author

Taking into account weights when calculating program size with `is_weighted=true` by default.

Taking into account weights when calculating program size with `is_weighted=false` by default.

gAldeia
Jun 29, 2023
Collaborator Author

gAldeia
Jul 17, 2023
Collaborator Author