Fix/merge back v2 #474

sebhrusen · 2022-07-12T14:40:35Z

previous attempt was squash-merged, apparently deleting the common node between stable-v2 and master

* Add a workflow to tag latest `v*` release as `stable` (#399) Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit. * Bump auto-sklearn to 0.14.0 (#400)

* Add the version tag to the image name if present * Fix casing for MLNet framework definition

* Add volume meta data to aws meta info

* Add constraints for v2 benchmark For ease of reproducibility, we want to include our experimental setup in the constraints file. For our experiments we increase the volume size to 100gb and require gp3 volumes (general purpose SSD).

* let the job runner handle the rescheduling logic to ensure that the job is always can't be acted upon by current worker after being rescheduled * remove commented code

Made the previous version abstract to avoid accidentally running the wrong version of GAMA for the benchmark.

* Unsparsify target variables for (Tuned)RF Sparse targets are not supported in scikit-learn 0.24.2, and are used with tasks 360932 and 360933 (QSAR) in the benchmark. * cosmetic change to make de/serialization easier to debug Co-authored-by: Sebastien Poirier <sebastien@h2o.ai>

Since it's entirely possible that the processes were already terminating, but only completed termination between the process.children call and the proc.terminate/kill calls.

* fixes #432 add precision to runtimes in results.csv * Update amlb/results.py Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: seb. <sebastien@h2o.ai>

* Iteratively build the forest to honor constraints In particular depending on the dataset size either memory or time constraints can become a problem which makes it unreliable as a baseline. Gradually growing the forest sidesteps both issues. * Make iterative fit default, parameterize execution * Step_size as script parameter, safer check if done When final_forest_size is not an exact multiple of step_size, randomforest should still terminate. Additionally step_size is escaped with an underscore as it is not a RandomForestEstimator hyperparameter.

…ts (#441) * Iterative fit to meet memory and time constraints Specifically for each value of `max_features` to try, an equal time budget is alloted, with one additional budget being reserved for the final fit. This does mean that different `max_features` can lead to different number of trees, but it keeps it simple. * Abort tuning when close to total time budget The first fit of each iterative fit for a `max_features` value was not guarded, which can lead to exceeding the total time budget. This adds a check before the first fit to estimate whether the budget will be exceeded, and if so aborting further tuning and continue with the final fit. * Make k_folds configurable * Add scikit-learn code with explanation * Modify cross_validate, allow 1 estimator per split This is useful when we maintain a warm_started model for each individual split. * Use custom cv function to allow warm-start By default estimators are cloned in any scikit-learn cross_validate function (which stops warm-start) and it is not possible to specify a specific estimator-object per fold (which stops warm-start). The added custom_validate module makes changes to the scikit-learn code to allow warm-starting to work in conjunction with the cross-validate functionality. For more info see scikit-learn#22044 and scikit-learn#22087. * Add parameter to set tune time, rest is for fit The previous iteration where the final fit was treated as an equivalent budget to any other optimization sometimes left too little time to train the final forest, in particular when the last fit took longer than expected. This would often lead to very small forests for the final model. The new system guarantees roughly 10% of budget for the final forest, guaranteeing a better final fit.

In a previous iteration it was encoded as a numpy file, but now it's serialized to JSON which means that results.probabilities is simply a string if imputation is required.

Technically monkeypatch xmltodict function used by openml when reading the features xml

Was supposed to be included with #443

…er (#468) * change workflow to correctly modify the app version on releases and when forcing merged version back to master * protect main branch from accidental releases

PGijsbers and others added 30 commits September 17, 2021 10:58

Add stable tag workflow, bump auto-sklearn (#401)

5b6d8bf

* Add a workflow to tag latest `v*` release as `stable` (#399) Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit. * Bump auto-sklearn to 0.14.0 (#400)

Fix/docker tag (#404)

9f25b0b

* Add the version tag to the image name if present * Fix casing for MLNet framework definition

Changed latest from master to main

f3ae2a5

Update version to 2.0.1

4b77821

Improv/aws meta (#413)

11324f7

* Add volume meta data to aws meta info

Add constraints for v2 benchmark (#415)

9079c07

* Add constraints for v2 benchmark For ease of reproducibility, we want to include our experimental setup in the constraints file. For our experiments we increase the volume size to 100gb and require gp3 volumes (general purpose SSD).

Update version to 2.0.2

92b81bd

Fix AWS random cancel issue (#422)

cebb3e4

* let the job runner handle the rescheduling logic to ensure that the job is always can't be acted upon by current worker after being rescheduled * remove commented code

Add a GAMA configuration intended for benchmarking (#426)

c9dd39b

Made the previous version abstract to avoid accidentally running the wrong version of GAMA for the benchmark.

ensure that openml is configured when loading the tasks (#427)

f9381e9

Expect a possible NoSuchProcess error (#428)

7e82023

Since it's entirely possible that the processes were already terminating, but only completed termination between the process.children call and the proc.terminate/kill calls.

Reset version for versioning workflow

2e8f6d8

Update version to 2.0.3

a03850c

ensure that the docker images can be built from linux (#437)

ddc097c

Avoid querying terminated instance with CloudWatch (#438)

4670cdd

fixes #432 add precision to runtimes in results.csv (#433)

0e46e34

* fixes #432 add precision to runtimes in results.csv * Update amlb/results.py Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: seb. <sebastien@h2o.ai>

Revert version to _dev_version to prepare release (#444)

fbd7a7a

Update version to 2.0.4

d888d4a

Signal to encode predictions as proba now works (#447)

dbfa4fe

In a previous iteration it was encoded as a numpy file, but now it's serialized to JSON which means that results.probabilities is simply a string if imputation is required.

Monkeypatch openml to keep whitespace in features (#446)

0e615fa

Technically monkeypatch xmltodict function used by openml when reading the features xml

fixe for mlr3automl (#443)

217f65a

Reset version for Github workflow (#448)

54197a6

Update version to 2.0.5

397d58f

Update mlr3automl to latest

31f18e8

Was supposed to be included with #443

Update MLR3 (#461)

0b4ddfa

Reset version for version bump

82f1872

Updatet version because GA failed

a328310

seb added 2 commits July 12, 2022 14:23

Issue 416: fixing versioning workflow for releases and merges to mast…

b6b9b8f

…er (#468) * change workflow to correctly modify the app version on releases and when forcing merged version back to master * protect main branch from accidental releases

Merge branch 'stable-v2' into fix/merge-back-v2

526526a

sebhrusen merged commit bcd2a28 into master Jul 12, 2022

sebhrusen deleted the fix/merge-back-v2 branch July 12, 2022 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/merge back v2 #474

Fix/merge back v2 #474

sebhrusen commented Jul 12, 2022 •

edited

Loading

Fix/merge back v2 #474

Fix/merge back v2 #474

Conversation

sebhrusen commented Jul 12, 2022 • edited Loading

sebhrusen commented Jul 12, 2022 •

edited

Loading