Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include cardData in list_models and list_datasets #639

Merged
merged 9 commits into from
Feb 4, 2022
Merged

Include cardData in list_models and list_datasets #639

merged 9 commits into from
Feb 4, 2022

Conversation

muellerzr
Copy link
Contributor

To support listing the carbon emissions as well as any other metadata that can be returned with cardData:True, this PR now includes that as a parameter to both list_models and list_datasets.

Note: By default this is True because timing wise I did not see a difference between with vs without cardData, and returning the metadata for each model and dataset only increases the understandability of each one IMO.

cc @julien-c @LysandreJik

@muellerzr muellerzr requested a review from LysandreJik February 1, 2022 19:56
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR!

Would it be possible to add a bit of docs regarding how this attribute can be leveraged, for example with the co2_eq_emissions?

https://huggingface.co/docs/hub/searching-the-hub

I observed the same, putting cardData to True or False does not change the time required to list all models, so it's fine for me to put it enabled by default.

tests/test_hf_api.py Outdated Show resolved Hide resolved
tests/test_hf_api.py Outdated Show resolved Hide resolved
@julien-c
Copy link
Member

julien-c commented Feb 2, 2022

I observed the same, putting cardData to True or False does not change the time required to list all models, so it's fine for me to put it enabled by default.

can you compare the underlying API calls? I think you're not seeing too much bc we have good bandwidth, but the underlying calls outputs from /api/models would have very different content sizes imo.

muellerzr and others added 3 commits February 4, 2022 12:24
@muellerzr
Copy link
Contributor Author

@julien-c @LysandreJik should be good now.

I made the cardData none by default, but the explicit docstring tells what extra data you can get back from it.

@julien-c
Copy link
Member

julien-c commented Feb 4, 2022

lgtm!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the tests, LGTM

tests/test_hf_api.py Outdated Show resolved Hide resolved
tests/test_hf_api.py Outdated Show resolved Hide resolved
muellerzr and others added 2 commits February 4, 2022 16:35
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
@muellerzr
Copy link
Contributor Author

Merging PR. For transparency:

        self.assertTrue(all([not hasattr(dataset, "cardData") for dataset in datasets]))

This will always fail because it is hard coded into the DatasetResult to set cardData to None, even if it's not present.

@LysandreJik let me know if you'd like me to clean that up a bit with some setattr's instead of hard-coding inputs.

@muellerzr muellerzr merged commit c3e07e3 into main Feb 4, 2022
@muellerzr muellerzr deleted the co2 branch February 4, 2022 23:03
@muellerzr muellerzr restored the co2 branch February 9, 2022 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants