Replies: 5 comments
-
Improving the plots and descriptive statisticsCommit 3920413 tries to help us get insights about the mutation learners usability by keeping track of their choices, the values of its inner parameters, and the metrics of the population. The structure of the study notebooks is still the same, as are the experiments and datasets. Additionally to what I've already mentioned in the discussion, we have:
New tables looks like this: And new plots looks like these: Exploring when to update the learnerAdditionally, I've changed how the learners are updated (just to explore if we should update after each mutation or only at the end of the generation). Now we have a list that keeps track of every choice the learner makes (as well as its reward), but these values are used to update the learner only when the size of the batch reaches a value of BugsI've noticed a problem with binary columns in the dataset (commit d35edcd creates a test that fails due to a core dump, mentioned in issue #37). Design of implementation in C++I am working on making mutation and crossover returns Next stepsI hope this new information helps us determine whether these learners are working and actually improving their results. I'll finish working on the |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Changing how we calculate the size
EDIT: I tried running the experiments setting We are calculating the size of the programs without taking into account the weights in the nodes. I made a change in the cpp code locally and tested how the results would look like with and without this modification. Without taking into account weights when calculating program sizeTaking into account weights when calculating program size with
|
Beta Was this translation helpful? Give feedback.
-
Learners are not showing improvementsThe results above were obtained at the first experiment with two dataset. Although I've been changing several aspects in the code, I couldn't get something better than these. And I'm using 6 datasets of different dimensions to better validate the implementations. I am starting to think that --- since the underlying evolutionary algorithm is the NSGA-II (which is interesting for us, since we want both accurate and interpretable models) --- we need to use multi-objective learners as well. There are some works on modifying the UCB1 algorithm to work with multi-objectives (Designing multi-objective multi-armed bandits algorithms: A study), and it seems that I should go to this direction. I'm also wondering: do we need to have dynamic bandits, or can we get rid of dynamically changing the learned distributions by using the context of the expressions? In this line, I found a contextual multi-objective MAB (Multi-Objective contextual bandits with a dominant objective). For now I'll be still trying to improve the implementations and experiments, but I'm going to change what we are using as learners, probably by implementing these two algorithms. Baseline in experimentsLast but not least, I've made two changes in the experiment settings:
|
Beta Was this translation helpful? Give feedback.
-
Incorporating context on the learners (and fixes)This time, I've been working on making sure everything is implemented correctly, and adding pareto and context into the learners. now I think we got a complete environment to explore how brush can benefit from learners. Hopefully, the next step is obtaining final results and start exploring it. I've implemented the algorithm from Designing multi-objective multi-armed bandits algorithms: A study. Now we have a pareto learner. Tons of changes and bug fixes! Context LearnersI also searched how to do a contextual learner. I found 3 different main approaches in the literature:
The last one draw my attention due to the fact that we have multiple independent learners, instead of a single one. This means we can learn complex policies --- more general than a linear model, but without the burden of using a NN ---, at the cost of having to deal with managing the partitions. There are some approaches (zooming, smoothing, adapting) to do that. The authors proposed a contextual learner that takes as argument any context-free learner. I've picked this one to implement because --- with a single implementation --- I can double the number of learners in the experiment. Since I have learners based on UCB1s, uniform distributions, Thompson Sampling, and Pareto front, this allows us to see how context can improve them. The idea of this algorithm is to manage sub-spaces, zooming into contexts, using the idea of balls and finite metric spaces (Still need to debug and make sure everything is correctly implemented!). Another interesting thing is that the balls can be deactivated (then, a new one is created to fill its place), and the learners are associated with the balls. This works like a reset into sub-regions, with was demonstrated in the paper that can handle with some drifting into the distribution of the arms. Experiment changes
Changes (both brush and brush-weights):
Bug fixes
|
Beta Was this translation helpful? Give feedback.
-
Learning optimal mutation weights
TL;DR: Brush has 4 types of mutations, and I'm implementing learners to optimize which mutation to use during the evolution. I have implemented two different methods in two notebooks (Dynamic Thompson Sampling and Dynamic Multi-Armed Bandit). The important stuff in these notebooks are the statistic table and the mutation choices over iterations (example below). There is one experiment for classification and one for regression.
Brief description of the problem
I'm prototyping methods to learn the mutation weights during evolution.
Brush is designed to have weights used in sampling steps. This feature exists in a variety of places: operators and terminals in the search space, nodes in program trees, and in different types of mutation. The idea is that, during evolution, we learn which options give better rewards than others, and increase their weights (aka probabilities) of being sampled. In the GA literature, this is called adaptive pursuit, and it is similar to the Multi-Armed Bandit (and the Explore vs. Exploit (EvE) problem).
An improvement to this weights learning is to dynamically change these weights as the context changes (during the evolution, the characteristics of the population will change, and we should respond to it). This is what I am trying to achieve: prototyping an evolutionary algorithm that uses some logic to increase the number of choices that seem to have a higher average reward, and decrease those that don't (always balancing between exploration and exploitation).
What I have done
Right now I am testing only with mutation weights since I'm prototyping in Python and these methods need to know exactly what is going on (and the C++ interface is more convenient to do that with mutation).
We have 4 different mutations (point, insert, delete, and toggle weight), which produce different results. I've implemented two methods.
I managed to get it working in two notebooks:
The structure of these two notebooks is as follows:
> blockquotes
are personal thoughts);- 30 executions of the original algorithm and the modified version;
- descriptive statistics of the executions;
- plot showing the mutation choices over iterations.
Next steps
I am still checking for bugs in these modified versions. I feel that I need to get a robust empirical result that these methods work with the mutation options before moving into learning weights for other stuff (because it would require both writing more interfaces in the Python wrapper, and the problem would become much harder, as we would have more "arms").
Another problem that I have is: how do I use these specific algorithms to change Brush weights? Brush is using the weights with uniform distribution sampling, and each of the methods above uses different approaches (the former uses a prior Beta distribution, and the latter makes deterministic choices). I need to think about how to make this implementation interoperable with how Brush is structured. This solution must be transparent to the user, in a way that the weights can be interpreted as they already are.
Finally, the last problem (which I think we should address later) is the design of the implementation of these learners in C++.
Beta Was this translation helpful? Give feedback.
All reactions