Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example #791

Merged
merged 3 commits into from
Oct 2, 2019
Merged

Add example #791

merged 3 commits into from
Oct 2, 2019

Conversation

mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Oct 1, 2019

  • adds example for Feurer et al. (2015)
  • removes the stub for Fusi et al. (2018) as they actually perform the
    same task. I can't create an example, though, as they used regression
    datasets for classification (and OpenML by now forbids creating such
    tasks).

* adds example for Feurer et al. (2015)
* removes the stub for Fusi et al. (2018) as they actually perform the
  same task. I can't create an example, though, as they used regression
  datasets for classification (and OpenML by now forbids creating such
  tasks).
@codecov-io
Copy link

codecov-io commented Oct 1, 2019

Codecov Report

Merging #791 into develop will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #791      +/-   ##
===========================================
- Coverage    87.71%   87.68%   -0.03%     
===========================================
  Files           36       36              
  Lines         4208     4248      +40     
===========================================
+ Hits          3691     3725      +34     
- Misses         517      523       +6
Impacted Files Coverage Δ
openml/evaluations/evaluation.py 60.52% <0%> (-3.76%) ⬇️
openml/extensions/sklearn/__init__.py 100% <0%> (ø) ⬆️
openml/extensions/sklearn/extension.py 91.27% <0%> (+0.01%) ⬆️
openml/evaluations/functions.py 92.96% <0%> (+0.96%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f461732...811dfc6. Read the comment docs.

]

####################################################################################################
# The dataset IDs could be used directly to load the dataset and split the data into a training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to start with the dataset ids, rather than the task ids?

If the answer is yes, this clearly signals that we do not have any good procedures for "getting tasks that belong to a given set of datasets". We should either extend the API to support this better or provide the functions below as convenience function (or combination of both)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning here is to stay close to the Auto-sklearn paper, where only dataset IDs are given. What kind of convenience function would you like to have? Something like:

def get_tasks_for_dataset(
    dataset_id: int,
    task_type_id: int,
    estimation_procedure: str,
    status: str,
    check_target_attribute: bool,
) -> List:
    pass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also make the note more drastic.

# deactivated tasks
tasks_d = openml.tasks.list_tasks(
task_type_id=1,
status='deactivated',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not search for status "all" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of knowledge, I'll update the example.

task_ids.sort()

# These are the tasks to work with:
print(task_ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think print is fine for examples. logging is only important for the library itself to make the amount of output controllable.

Copy link
Member

@janvanrijn janvanrijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I could not find anything big.

@mfeurer
Copy link
Collaborator Author

mfeurer commented Oct 2, 2019

Thanks for the review, I hope I could address your comments.

@mfeurer mfeurer merged commit 8cc302d into develop Oct 2, 2019
@mfeurer mfeurer deleted the add_examples_feurer_et_al_and_fusi_et_al branch October 2, 2019 16:08
@@ -10,9 +10,80 @@
~~~~~~~~~~~

| Efficient and Robust Automated Machine Learning
| Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter
| Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter # noqa F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfeurer wrong usage of # noqa F401 in the text ? It is not interpreted as a comment.
Maybe you meant here:

]

####################################################################################################
# The dataset IDs could be used directly to load the dataset and split the data into a training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*training set

# It is discouraged to work directly on datasets and only provide dataset IDs in a paper as
# this does not allow reproducibility (unclear splitting). Please do not use datasets but the
# respective tasks as basis for a paper and publish task IDS. This example is only given to
# showcase the use OpenML-Python for a published paper and as a warning on how not to do it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the use of OpenML-Python*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants