Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking active datasets #671

Closed
ArlindKadra opened this issue Apr 10, 2019 · 3 comments · Fixed by #980
Closed

Checking active datasets #671

ArlindKadra opened this issue Apr 10, 2019 · 3 comments · Fixed by #980
Assignees

Comments

@ArlindKadra
Copy link
Member

dataset_list = list_datasets(status='all')
active = {}

Maybe we should list_datasets(status='active') and if a dataset_id from the iterable of dataset_ids that we want to check is in the results, we can mark it as True, otherwise False.

@PGijsbers
Copy link
Collaborator

The distinction here is that you no longer check whether the dataset id exists at all. I.e. a dataset which is inactive and one that does not exist altogether would be be returned as False. Unless I misunderstand your meaning.

@ArlindKadra
Copy link
Member Author

@PGijsbers you understood the idea right and the point you are making is valid.

Then, I would propose to further improve my initial idea and only list_datasets(data_id=[dataset_ids]) for the dataset_ids that we have. This is supported on the live server now ( we should also update the documentation and function of list_datasets, because data_id is not shown as a valid filter).

In the end we can just compare len(results) == len(dataset_ids) in case we want to know if there was a dataset_id that does not exist. (If we also want the specific dataset_id, we can check the keys)

@PGijsbers
Copy link
Collaborator

I definitely think we should raise an error if the passed dataset id does not exist, consider the following use case of the function:

dids = [ ... some ids ... ]
dids_active = check_datasets_active(dids)
if all(dids_active.values()):
    # All datasets are active, do something
    ...

If there is a more efficient way to retrieve active status for the datasets from the server, I am all for it.
Even if that means afterwards specifically checking if there is a dataset missing.
Having the mentioned clause

if len(results) != len(dataset_ids): 
    <find out missing datasets> 
    <raise error about missing datasets> ` 

is absolutely fine as in good-weather scenarios the simple length comparison should be negligible in terms of performance. Adding a parameter raise_error_if_not_exist to only optionally raise errors is also acceptable fine, but it should (in my opinion) default to True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants