Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information (not reproducible) #1

Open
ledell opened this issue Feb 6, 2018 · 15 comments
Open

Missing information (not reproducible) #1

ledell opened this issue Feb 6, 2018 · 15 comments

Comments

@ledell
Copy link

ledell commented Feb 6, 2018

Hi there,
Thanks for releasing this code. There are a few pieces of information missing that prevent the benchmarks from being reproducible.

  • Which version of H2O and auto-sklearn were used in this comparison?
  • What are the specs of the hardware (or AWS instance, etc) was used to run these benchmarks? Specifically, how many cores (assuming a single machine) and how much RAM?
  • If I were to run the MLJar benchmarks, would the free plan (plus some credits) suffice, computationally?
  • I don't see the script that shows the actual run of the benchmark (including dataset ids and seeds).
@pplonski
Copy link
Contributor

pplonski commented Feb 6, 2018

Hi Erin!

  • h2o-3.17.0.4115 and auto-sklearn 0.2.1
  • AWS instances with 8 CPU and 15GB RAM (c4.2xlarge)
  • I was using plan with 12 parallel jobs (highest plan)
  • the script was running on seed values from 0 to 9 (10 repetition of train/validation split), on all datasets which are described in the table.

Are you able to reproduce results now? In case any questions, I'm happy to help.

@ledell
Copy link
Author

ledell commented Feb 10, 2018

Hi,
Thanks for the info! I have not had a chance to run the benchmark code yet. Is there a reason that you used a nightly/dev version of H2O instead of a stable version? I am not sure it will change the results that much, but it's easier to document the benchmarks if you use a stable/released version.

@pplonski
Copy link
Contributor

The only reason I used nightly build is to have to most recent version to not miss any feature/fix.

I was using 1 hour as a limit for model training time, but I'm very curious what will be results for 2 or 3 hours of model training.

@don-lab-dc
Copy link

I'd also love to see the results on stable releases over longer run times. (@ledell, have you done any work on this?) Also, for the time-limited version of this, it seems like you might want to use 8 parallel jobs rather than 12, given that your AWS instance only has 8 CPUs. Or, you could do something like select the best result from t-33% set of results. Also curious -- any reason you did not include TPOT?

@ledell
Copy link
Author

ledell commented Aug 22, 2018

@don-lab-dc I am working on a large benchmark right now with a few other open source AutoML package authors (right now includes auto-sklearn, H2O AutoML, TPOT, AutoWeka and a few others). We will publish our findings in the next 2 months or so.

@felipeportella
Copy link

@ledell, you already published your findings?

@ledell
Copy link
Author

ledell commented Jan 31, 2019

@felipeportella Not yet (we had to take a break from the work for a few months), but we are planning to submit a paper to a workshop this spring.

@felipeportella
Copy link

Thanks, @ledell ... let us know (in this thread) when published.

Just as an additional reference, on 17/08/2018 the guys from NCSU published this benchmarking:
https://arxiv.org/abs/1808.06492v1

@ledell
Copy link
Author

ledell commented Feb 2, 2019

@felipeportella I've seen that benchmark -- the H2O benchmarks are all wrong. This paper (and how bad it was) was the motivation for our work.

@ajoeajoe
Copy link

I config API key and run use !python main.py -p h2o -d 24 -s 2, errors:

Exception: [Errno 2] File b'./data/24.csv' does not exist: b'./data/24.csv'

@pplonski
Copy link
Contributor

Please run the following commands before:

mkdir data
python get_data.py

@ajoeajoe
Copy link

ajoeajoe commented Jul 10, 2019

I have changed some codes with get_data.py file, for I use python3, as following:

dataset.get_data method seems has no this following:

#return_categorical_indicator=True
#return_attribute_names=True

codes like:

import os
import openml
import pandas as pd
import numpy as np

openml.config.apikey = 'mykey'

dataset_ids = [3, 24, 31, 38, 44, 179, 715, 718, 720, 722, 723, 727, 728, 734, 735, 737, 740, 741,
               819, 821, 822, 823, 833, 837, 843, 845, 846, 847]

for dataset_id in dataset_ids:
    print ('Get dataset id', dataset_id)
    dataset = openml.datasets.get_dataset(dataset_id)
    (X, y, categorical, names) = dataset.get_data(target=dataset.default_target_attribute)#, \
                                        #return_categorical_indicator=True, \
                                        #return_attribute_names=True)
    if len(np.unique(y)) != 2:
        print ('Not binary classification')
        continue
    vals = {}
    for i, name in enumerate(names):
        #vals[name] = X[:,i]
        vals[name] = X[name]
    vals['target'] = y
    df = pd.DataFrame(vals)
    df.to_csv('./data/{0}.csv'.format(dataset_id), index=False)

I have download all data file , but remain problems when I try "!python main.py -p auto-sklearn -d 3 -s 2":
I use python3,I don’t know why you first download as csv file ,it seems there is no need to do this job, can you try it in python3 before, it seems something happened in data format.

/usr/local/lib/python3.6/dist-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
Exception: could not convert string to float: 'f'

@ajoeajoe
Copy link

ajoeajoe commented Jul 10, 2019

@ledell you already published your findings? why the H2O benchmarks in that paper are all wrong?

@ledell
Copy link
Author

ledell commented Jul 10, 2019

@ajoeajoe There were a number of issues in those benchmarks but the main one related to H2O was how they used Java memory inside of docker (they set the heap size to 100G on very small EC2 instances) which caused H2O to fail. They also prevented H2O from using all available cores (ncores was limited to 2, artificially crippling the performance).

Yes, we published our benchmarks at ICML this year: https://arxiv.org/abs/1907.00909

@ajoeajoe
Copy link

Thanks , good job, I think I should also use https://github.com/openml/automlbenchmark for experiment works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants