Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the performance of check_datasets_active, modifying unit test #980

Merged
merged 4 commits into from
Oct 29, 2020

Conversation

ArlindKadra
Copy link
Member

Reference Issue

Fixes #671

What does this PR implement/fix? Explain your changes.

Improves the performance of the function check_datasets_active by querying only for the given datasets, instead of all datasets. Furthemore, the error throwing behavior can be controller by the user.

How should this PR be tested?

Existing unit test which was modified.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 only two minor remarks

doc/progress.rst Outdated Show resolved Hide resolved
openml/datasets/functions.py Outdated Show resolved Hide resolved

Returns
-------
dict
A dictionary with items {did: bool}
"""
dataset_list = list_datasets(status="all")
dataset_list = list_datasets(
dataset_ids=dataset_ids,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset_ids=dataset_ids,
data_id=dataset_ids,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a rather important one that hadn't caught my eye first time around 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, I got a suggestion for that and the function had no argument like that at all :S. Nice catch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like because of the **kwargs (used to allow all filters)

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending CI) 👍 edit: oops, that would be an accept

@PGijsbers PGijsbers merged commit f2af798 into develop Oct 29, 2020
@PGijsbers PGijsbers deleted the fix#671 branch October 29, 2020 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checking active datasets
3 participants