-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/give possibility to not download the dataset qualities #1017
Feature/give possibility to not download the dataset qualities #1017
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfeurer Do you think the qualities should be loaded if they are present in cache? That would mean different results of the function depending on the cache state. I would think we can skip it altogether?
@a-moadel Please also add unit tests that test the feature works as intended (after we agree on the desired behavior).
I noticed the pre-commit
workflow is not triggered. Does anyone have an idea why this would not trigger?
Closes #1009.
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
… cache status , adding unit test for get_dataset_qualities
…alities' of https://github.com/a-moadel/openml-python into feature/give_possibility_to_not_download_the_dataset_qualities
This looks good to me now. @PGijsbers you had a comment about persistent behavior, that's not yet addressed, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, sorry for the delay. I think it works as intended now 🥳 however I do have a few minor comments that I would prefer to see addressed before merging. I'd like to see references to the old behavior updated, and I suggested a minor code change which puts the responsibility of not loading qualities
in the get_dataset
function instead. I don't think it makes sense to pass down a bool
that signifies you just want ""
back - then we don't need to call down in the first place.
@@ -433,6 +433,11 @@ def test__get_dataset_qualities(self): | |||
qualities_xml_path = os.path.join(self.workdir, "qualities.xml") | |||
self.assertTrue(os.path.exists(qualities_xml_path)) | |||
|
|||
def test__get_dataset_qualities_skip_download(self): | |||
qualities = _get_dataset_qualities_file(self.workdir, 2, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use keywords here (e.g. download_qualities=False
)?
openml/datasets/functions.py
Outdated
@@ -405,7 +412,9 @@ def get_dataset( | |||
features_file = _get_dataset_features_file(did_cache_dir, dataset_id) | |||
|
|||
try: | |||
qualities_file = _get_dataset_qualities_file(did_cache_dir, dataset_id) | |||
qualities_file = _get_dataset_qualities_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to use download_qualities
here and not call the _get_dataset_qualities_file
if we don't want to process the qualities.
openml/datasets/functions.py
Outdated
@@ -290,6 +290,8 @@ def _name_to_id( | |||
error_if_multiple : bool (default=False) | |||
If `False`, if multiple datasets match, return the least recent active dataset. | |||
If `True`, if multiple datasets match, raise an error. | |||
download_qualities : bool, optional (default=True) | |||
If `True`, also download qualities.xml file. If false use the file if it was cached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this and the other references where it says the cached version is used if false
is passed, as this is not the behavior anymore.
…parameter, remove unnecessarily call for download qualities
Codecov Report
@@ Coverage Diff @@
## develop #1017 +/- ##
===========================================
+ Coverage 87.62% 88.08% +0.45%
===========================================
Files 36 36
Lines 4574 4565 -9
===========================================
+ Hits 4008 4021 +13
+ Misses 566 544 -22
Continue to review full report at Codecov.
|
Reference Issue
What does this PR implement/fix? Explain your changes.
We can Ignore downloading the dataset qualities if its not exist while getting datasets.
How should this PR be tested?
Clear the cache , and calling get_dataset with download_qualities set to false : this result in qulaities.xml to not be downloaded.
Clear the cache , and calling get_dataset with download_qualities set to default : this result in qulaities.xml to be downloaded.
Any other comments?