-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing information (not reproducible) #1
Comments
Hi Erin!
Are you able to reproduce results now? In case any questions, I'm happy to help. |
Hi, |
The only reason I used nightly build is to have to most recent version to not miss any feature/fix. I was using 1 hour as a limit for model training time, but I'm very curious what will be results for 2 or 3 hours of model training. |
I'd also love to see the results on stable releases over longer run times. (@ledell, have you done any work on this?) Also, for the time-limited version of this, it seems like you might want to use 8 parallel jobs rather than 12, given that your AWS instance only has 8 CPUs. Or, you could do something like select the best result from t-33% set of results. Also curious -- any reason you did not include TPOT? |
@don-lab-dc I am working on a large benchmark right now with a few other open source AutoML package authors (right now includes auto-sklearn, H2O AutoML, TPOT, AutoWeka and a few others). We will publish our findings in the next 2 months or so. |
@ledell, you already published your findings? |
@felipeportella Not yet (we had to take a break from the work for a few months), but we are planning to submit a paper to a workshop this spring. |
Thanks, @ledell ... let us know (in this thread) when published. Just as an additional reference, on 17/08/2018 the guys from NCSU published this benchmarking: |
@felipeportella I've seen that benchmark -- the H2O benchmarks are all wrong. This paper (and how bad it was) was the motivation for our work. |
I config API key and run use !python main.py -p h2o -d 24 -s 2, errors: Exception: [Errno 2] File b'./data/24.csv' does not exist: b'./data/24.csv' |
Please run the following commands before: mkdir data
python get_data.py |
I have changed some codes with get_data.py file, for I use python3, as following: dataset.get_data method seems has no this following: #return_categorical_indicator=True codes like: import os
import openml
import pandas as pd
import numpy as np
openml.config.apikey = 'mykey'
dataset_ids = [3, 24, 31, 38, 44, 179, 715, 718, 720, 722, 723, 727, 728, 734, 735, 737, 740, 741,
819, 821, 822, 823, 833, 837, 843, 845, 846, 847]
for dataset_id in dataset_ids:
print ('Get dataset id', dataset_id)
dataset = openml.datasets.get_dataset(dataset_id)
(X, y, categorical, names) = dataset.get_data(target=dataset.default_target_attribute)#, \
#return_categorical_indicator=True, \
#return_attribute_names=True)
if len(np.unique(y)) != 2:
print ('Not binary classification')
continue
vals = {}
for i, name in enumerate(names):
#vals[name] = X[:,i]
vals[name] = X[name]
vals['target'] = y
df = pd.DataFrame(vals)
df.to_csv('./data/{0}.csv'.format(dataset_id), index=False) I have download all data file , but remain problems when I try "!python main.py -p auto-sklearn -d 3 -s 2": /usr/local/lib/python3.6/dist-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d
Exception: could not convert string to float: 'f' |
@ledell you already published your findings? why the H2O benchmarks in that paper are all wrong? |
@ajoeajoe There were a number of issues in those benchmarks but the main one related to H2O was how they used Java memory inside of docker (they set the heap size to 100G on very small EC2 instances) which caused H2O to fail. They also prevented H2O from using all available cores (ncores was limited to 2, artificially crippling the performance). Yes, we published our benchmarks at ICML this year: https://arxiv.org/abs/1907.00909 |
Thanks , good job, I think I should also use https://github.com/openml/automlbenchmark for experiment works. |
Hi there,
Thanks for releasing this code. There are a few pieces of information missing that prevent the benchmarks from being reproducible.
The text was updated successfully, but these errors were encountered: