Missing information (not reproducible) #1

ledell · 2018-02-06T18:32:02Z

Hi there,
Thanks for releasing this code. There are a few pieces of information missing that prevent the benchmarks from being reproducible.

Which version of H2O and auto-sklearn were used in this comparison?
What are the specs of the hardware (or AWS instance, etc) was used to run these benchmarks? Specifically, how many cores (assuming a single machine) and how much RAM?
If I were to run the MLJar benchmarks, would the free plan (plus some credits) suffice, computationally?
I don't see the script that shows the actual run of the benchmark (including dataset ids and seeds).

pplonski · 2018-02-06T18:59:38Z

Hi Erin!

h2o-3.17.0.4115 and auto-sklearn 0.2.1
AWS instances with 8 CPU and 15GB RAM (c4.2xlarge)
I was using plan with 12 parallel jobs (highest plan)
the script was running on seed values from 0 to 9 (10 repetition of train/validation split), on all datasets which are described in the table.

Are you able to reproduce results now? In case any questions, I'm happy to help.

ledell · 2018-02-10T04:17:56Z

Hi,
Thanks for the info! I have not had a chance to run the benchmark code yet. Is there a reason that you used a nightly/dev version of H2O instead of a stable version? I am not sure it will change the results that much, but it's easier to document the benchmarks if you use a stable/released version.

pplonski · 2018-02-10T11:00:15Z

The only reason I used nightly build is to have to most recent version to not miss any feature/fix.

I was using 1 hour as a limit for model training time, but I'm very curious what will be results for 2 or 3 hours of model training.

don-lab-dc · 2018-08-21T13:26:42Z

I'd also love to see the results on stable releases over longer run times. (@ledell, have you done any work on this?) Also, for the time-limited version of this, it seems like you might want to use 8 parallel jobs rather than 12, given that your AWS instance only has 8 CPUs. Or, you could do something like select the best result from t-33% set of results. Also curious -- any reason you did not include TPOT?

ledell · 2018-08-22T00:57:14Z

@don-lab-dc I am working on a large benchmark right now with a few other open source AutoML package authors (right now includes auto-sklearn, H2O AutoML, TPOT, AutoWeka and a few others). We will publish our findings in the next 2 months or so.

felipeportella · 2019-01-31T15:29:00Z

@ledell, you already published your findings?

ledell · 2019-01-31T20:04:09Z

@felipeportella Not yet (we had to take a break from the work for a few months), but we are planning to submit a paper to a workshop this spring.

felipeportella · 2019-02-01T11:17:37Z

Thanks, @ledell ... let us know (in this thread) when published.

Just as an additional reference, on 17/08/2018 the guys from NCSU published this benchmarking:
https://arxiv.org/abs/1808.06492v1

ledell · 2019-02-02T07:00:17Z

@felipeportella I've seen that benchmark -- the H2O benchmarks are all wrong. This paper (and how bad it was) was the motivation for our work.

ajoeajoe · 2019-07-10T10:03:23Z

I config API key and run use !python main.py -p h2o -d 24 -s 2, errors:

Exception: [Errno 2] File b'./data/24.csv' does not exist: b'./data/24.csv'

pplonski · 2019-07-10T10:20:03Z

Please run the following commands before:

mkdir data
python get_data.py

ajoeajoe · 2019-07-10T14:59:56Z

I have changed some codes with get_data.py file, for I use python3, as following:

dataset.get_data method seems has no this following:

#return_categorical_indicator=True
#return_attribute_names=True

codes like:

import os
import openml
import pandas as pd
import numpy as np

openml.config.apikey = 'mykey'

dataset_ids = [3, 24, 31, 38, 44, 179, 715, 718, 720, 722, 723, 727, 728, 734, 735, 737, 740, 741,
               819, 821, 822, 823, 833, 837, 843, 845, 846, 847]

for dataset_id in dataset_ids:
    print ('Get dataset id', dataset_id)
    dataset = openml.datasets.get_dataset(dataset_id)
    (X, y, categorical, names) = dataset.get_data(target=dataset.default_target_attribute)#, \
                                        #return_categorical_indicator=True, \
                                        #return_attribute_names=True)
    if len(np.unique(y)) != 2:
        print ('Not binary classification')
        continue
    vals = {}
    for i, name in enumerate(names):
        #vals[name] = X[:,i]
        vals[name] = X[name]
    vals['target'] = y
    df = pd.DataFrame(vals)
    df.to_csv('./data/{0}.csv'.format(dataset_id), index=False)

I have download all data file , but remain problems when I try "!python main.py -p auto-sklearn -d 3 -s 2"：
I use python3，I don’t know why you first download as csv file ,it seems there is no need to do this job, can you try it in python3 before, it seems something happened in data format.

/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Exception: could not convert string to float: 'f'

ajoeajoe · 2019-07-10T15:12:49Z

@ledell you already published your findings? why the H2O benchmarks in that paper are all wrong?

ledell · 2019-07-10T15:28:53Z

@ajoeajoe There were a number of issues in those benchmarks but the main one related to H2O was how they used Java memory inside of docker (they set the heap size to 100G on very small EC2 instances) which caused H2O to fail. They also prevented H2O from using all available cores (ncores was limited to 2, artificially crippling the performance).

Yes, we published our benchmarks at ICML this year: https://arxiv.org/abs/1907.00909

ajoeajoe · 2019-07-11T06:06:42Z

Thanks , good job, I think I should also use https://github.com/openml/automlbenchmark for experiment works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing information (not reproducible) #1

Missing information (not reproducible) #1

ledell commented Feb 6, 2018

pplonski commented Feb 6, 2018

ledell commented Feb 10, 2018

pplonski commented Feb 10, 2018

don-lab-dc commented Aug 21, 2018

ledell commented Aug 22, 2018

felipeportella commented Jan 31, 2019

ledell commented Jan 31, 2019

felipeportella commented Feb 1, 2019

ledell commented Feb 2, 2019

ajoeajoe commented Jul 10, 2019

pplonski commented Jul 10, 2019

ajoeajoe commented Jul 10, 2019 •

edited

Loading

ajoeajoe commented Jul 10, 2019 •

edited

Loading

ledell commented Jul 10, 2019 •

edited

Loading

ajoeajoe commented Jul 11, 2019

Missing information (not reproducible) #1

Missing information (not reproducible) #1

Comments

ledell commented Feb 6, 2018

pplonski commented Feb 6, 2018

ledell commented Feb 10, 2018

pplonski commented Feb 10, 2018

don-lab-dc commented Aug 21, 2018

ledell commented Aug 22, 2018

felipeportella commented Jan 31, 2019

ledell commented Jan 31, 2019

felipeportella commented Feb 1, 2019

ledell commented Feb 2, 2019

ajoeajoe commented Jul 10, 2019

pplonski commented Jul 10, 2019

ajoeajoe commented Jul 10, 2019 • edited Loading

ajoeajoe commented Jul 10, 2019 • edited Loading

ledell commented Jul 10, 2019 • edited Loading

ajoeajoe commented Jul 11, 2019

ajoeajoe commented Jul 10, 2019 •

edited

Loading

ajoeajoe commented Jul 10, 2019 •

edited

Loading

ledell commented Jul 10, 2019 •

edited

Loading