Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/give possibility to not download the dataset qualities #1017

Merged
merged 11 commits into from
Feb 11, 2021
Merged

Feature/give possibility to not download the dataset qualities #1017

merged 11 commits into from
Feb 11, 2021

Conversation

a-moadel
Copy link
Contributor

Reference Issue

What does this PR implement/fix? Explain your changes.

We can Ignore downloading the dataset qualities if its not exist while getting datasets.

How should this PR be tested?

Clear the cache , and calling get_dataset with download_qualities set to false : this result in qulaities.xml to not be downloaded.
Clear the cache , and calling get_dataset with download_qualities set to default : this result in qulaities.xml to be downloaded.

Any other comments?

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfeurer Do you think the qualities should be loaded if they are present in cache? That would mean different results of the function depending on the cache state. I would think we can skip it altogether?
@a-moadel Please also add unit tests that test the feature works as intended (after we agree on the desired behavior).

I noticed the pre-commit workflow is not triggered. Does anyone have an idea why this would not trigger?

Closes #1009.

openml/datasets/functions.py Outdated Show resolved Hide resolved
openml/datasets/functions.py Outdated Show resolved Hide resolved
a-moadel and others added 6 commits January 13, 2021 16:46
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
… cache status , adding unit test for get_dataset_qualities
…alities' of https://github.com/a-moadel/openml-python into feature/give_possibility_to_not_download_the_dataset_qualities
@a-moadel a-moadel requested a review from PGijsbers January 26, 2021 12:03
@mfeurer
Copy link
Collaborator

mfeurer commented Jan 27, 2021

This looks good to me now. @PGijsbers you had a comment about persistent behavior, that's not yet addressed, right?

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, sorry for the delay. I think it works as intended now 🥳 however I do have a few minor comments that I would prefer to see addressed before merging. I'd like to see references to the old behavior updated, and I suggested a minor code change which puts the responsibility of not loading qualities in the get_dataset function instead. I don't think it makes sense to pass down a bool that signifies you just want "" back - then we don't need to call down in the first place.

@@ -433,6 +433,11 @@ def test__get_dataset_qualities(self):
qualities_xml_path = os.path.join(self.workdir, "qualities.xml")
self.assertTrue(os.path.exists(qualities_xml_path))

def test__get_dataset_qualities_skip_download(self):
qualities = _get_dataset_qualities_file(self.workdir, 2, False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use keywords here (e.g. download_qualities=False)?

@@ -405,7 +412,9 @@ def get_dataset(
features_file = _get_dataset_features_file(did_cache_dir, dataset_id)

try:
qualities_file = _get_dataset_qualities_file(did_cache_dir, dataset_id)
qualities_file = _get_dataset_qualities_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to use download_qualities here and not call the _get_dataset_qualities_file if we don't want to process the qualities.

@@ -290,6 +290,8 @@ def _name_to_id(
error_if_multiple : bool (default=False)
If `False`, if multiple datasets match, return the least recent active dataset.
If `True`, if multiple datasets match, raise an error.
download_qualities : bool, optional (default=True)
If `True`, also download qualities.xml file. If false use the file if it was cached.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this and the other references where it says the cached version is used if false is passed, as this is not the behavior anymore.

…parameter, remove unnecessarily call for download qualities
@codecov-io
Copy link

Codecov Report

Merging #1017 (55c7196) into develop (fba6aab) will increase coverage by 0.45%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1017      +/-   ##
===========================================
+ Coverage    87.62%   88.08%   +0.45%     
===========================================
  Files           36       36              
  Lines         4574     4565       -9     
===========================================
+ Hits          4008     4021      +13     
+ Misses         566      544      -22     
Impacted Files Coverage Δ
openml/datasets/functions.py 94.46% <100.00%> (+1.01%) ⬆️
openml/_api_calls.py 89.23% <0.00%> (-3.08%) ⬇️
openml/runs/functions.py 83.16% <0.00%> (+0.25%) ⬆️
openml/testing.py 84.52% <0.00%> (+0.59%) ⬆️
openml/utils.py 91.33% <0.00%> (+0.66%) ⬆️
openml/datasets/dataset.py 87.92% <0.00%> (+3.48%) ⬆️
openml/exceptions.py 96.77% <0.00%> (+9.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fba6aab...1b6467d. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants