Replies: 1 comment
-
We are working towards proposed solution, but remain open for other views/ideas. In the meanwhile we do require that automl frameworks open source their meta-learning by:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
note: This is issue #18, but moved to the more appropriate discussions board.
We encourage everyone to partake in this discussion, as it is an important but difficult issue to address.
The Problem
Meta-learning is the act of learning across datasets to generalize to new problems. AutoML frameworks which use meta-learning to aid their optimization gain an unfair advantage if the benchmark problem to solve was present in their meta-learning process.
For instance,
auto-sklearn
uses results of experiments on a plethora of datasets to warm-start their optimization. It will characterize new problems, relate them to those meta-learning datasets, and propose candidate solutions which worked well for those old problems. Ifauto-sklearn
is asked to solve a problem which it has seen before, it obviously benefits as it has access to experiment data on the same dataset. This is the unfair advantage.Discussion
The cleanest solution is that we should not allow AutoML framework tools to use any of the problems we selected in their meta-models (or depending on where to place the burden, we have to select datasets which were not used in any system's meta-learning process).
Both stances share the problem that both parties are very interested in using the data. Excluding them from meta-learning would make the AutoML tools worse, while excluding them from the benchmark would make the benchmark worse.
In our ICML 2019 AutoML workshop paper, we did not have a satisfying solution; we merely indicated where
auto-sklearn
had seen the dataset during meta-learning.Any proposed solutions should take into account that datasets in this benchmark change over time. This means that a tool which had no overlap in its meta-learning datasets with the benchmark, may have so after additions to the benchmark.
Abolishing meta-learning for the benchmark is also not an option. We want to evaluate the frameworks as a user would use them. Meta-learning has shown it can provide significant improvements (e.g.
auto-sklearn
,OBOE
), and we hope to see further improvements from it in the future. These improvements should also be accurately reflected in our benchmark.Solutions
I will try to keep this section up-to-date as the discussion grows.
One solution could be to require AutoML frameworks which use meta-learning to expose an interface to easily exclude a specific dataset from the meta-model (referenced by dataset id).
Beta Was this translation helpful? Give feedback.
All reactions