Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"no-token" is passed to huggingface_hub when token is None #4990

Closed
Wauplin opened this issue Sep 19, 2022 · 6 comments · Fixed by #5031
Closed

"no-token" is passed to huggingface_hub when token is None #4990

Wauplin opened this issue Sep 19, 2022 · 6 comments · Fixed by #5031
Assignees
Labels
bug Something isn't working

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Sep 19, 2022

Describe the bug

In the 2 lines listed below, a token is passed to huggingface_hub to get information from a dataset. If no token is provided, a "no-token" string is passed. What is the purpose of it ? If no real, I would prefer if the None value could be sent directly to be handle by huggingface_hub. I feel that here it is working because we assume the token will never be validated.

token=token if token else "no-token",

token=token if token else "no-token",

Expected results

Pass token=None to huggingface_hub.

Actual results

token="no-token" is passed.

Environment info

huggingface_hub v0.10.0dev

@albertvillanova
Copy link
Member

Hi @Wauplin, thanks for raising this potential issue.

The choice of passing "no-token" instead of None was made in this PR:

According to the PR description, the reason why it is passed is to avoid that HfApi.dataset_info uses the local token when no token should be used.

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2022

Hi @albertvillanova , thanks for finding the original issue 👍

As of next release of huggingface_hub, the token argument will be deprecated in favor of the use_auth_token argument in dataset_info method. This change as been done by @SBrandeis in huggingface/huggingface_hub#928. use_auth_token is a bit different and allow the case "don't sent the cached token by default".

If you want to strictly avoid sending the cached token from datasets, you can use:

# token=token if token else "no-token", <- will fail because token is not valid

use_auth_token=token if token else False,  # using the new `use_auth_token` parameter

And as a note, I am currently updating the "don't send the cached token by default"-rule to "don't send the cached token on public repos by default but use it in private ones" in huggingface/huggingface_hub#1064. This will not change the fact that use_auth_token=False doesn't send the token at all.

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2022

What is current strategy in term of updating huggingface_hub version in datasets ? I don't want to break stuff in the next release so let's find a proper solution :)

@lhoestq
Copy link
Member

lhoestq commented Sep 20, 2022

As soon as token is deprecated and hfh has a new release, we'll update datasets to use the new argument instead. Does it sound good to you ?

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2022

Perfect 👌

@albertvillanova
Copy link
Member

Hi @Wauplin, thanks for the warning about the deprecation of token in favor of use_auth_token.

Indeed, in datasets we use internally use_auth_token, which in this case was transformed to token to call HfApi.dataset_info:

token = HfFolder.get_token() if self.download_config.use_auth_token else None

Therefore, for the new hfh release, the fix will be trivial: we will pass directly use_auth_token.

As discussed during our meeting yesterday, due to the fact that at datasets we support multiple hfh versions, I think we should handle passing token or use_auth_token depending on the hfh version.

@lhoestq lhoestq self-assigned this Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants