-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the ability to configure etag timeout as env property #1703
Comments
Hey there 👋 I think an environment variable makes sense to overwrite the default value here. It feels to me that it's a quite low-level setting that a user would want to set once and for all for all libraries depending on So yeah, definitely in favor of the suggested env variable. I would call it
|
Hi @Wauplin , I will do it shortly and I will open a new pr. |
Great, looking forward for a PR! Thank you for raising the topic! |
@Wauplin @LysandreJik Hi team, |
@Shahafgo those 2 timeouts are indeed pretty different. I think it's already the case the the timeouts have different values (10s for the HEAD and 60s or the GET if I remember correctly). However I would not add another parameter for the GET call. I think it's fine enough with the default value. Here is my reasoning: The "why etag_timeout is more important to configure" is that in the case the model weights are already cached, we are still doing a HEAD call to the Hub to check the model has not being updated since we cached it. This HEAD call is important to stay up to date but not mandatory: if it fails we default back to the cached files. So reducing the HEAD call timeout would definitely help on slow connections when the files are already cached and we don't want to wait before loading them. On the contrary if the GET request is triggered, we need this call to work. If it doesn't, there is no fallback and the script fails with a TimeoutError. So setting a high value (60s) for the timeout should be good. Also this timeout period is in any case expected to be faster than the download time itself. What I'm afraid of as well is if we have a What do you think? Would also be interested in @LysandreJik's opinion here in case I missed something with these timeout settings. |
Your analysis sounds good to me @Wauplin |
Ok thanks, then let's keep only a |
@Wauplin @LysandreJik, guys I don׳t know how to thank you for the explanation, I appreciate it. @Wauplin as a quote: That is why I want to have configurable timout, which user can set higher timeout than 60 seconds if needed. Also I took a look on the code and it seems that the http_get function has 10 seconds timeout by default. |
Correct. My apologies, I thought it was way higher than this. Then yes it makes sense for the user to be able to increase this value. What do you think of EDIT: for both |
Ok thank you, I will make the recent wanted changes and I will open the pr. |
@Wauplin @LysandreJik This is the pr : #1720 |
Reviewed the PR! Thanks for the good job 🎉 |
@Wauplin Thank you for your review! |
Sorry for the confusion! What I wanted to say is that for each value (etag timeout and download timeout) we should check the set value against the default value only in 1 place which will be:
And nothing to check in |
Hello HuggingFace Team!
I noticed that the etag_timeout in the snapshot_download request supports passing customized values but for some reason it was not in the pipeline and tokenizers of transformers and the timeout is set by default to 10 seconds without the ability to override and perhaps in another libraries as well.
This ability is important to me as I want to give the user the ability to configure the etag_timeout and because there is not an options to do so in tokenizer I have thought maybe it is a good idea to override the etag timeout for all the libraries (like in huggingface-cli) that use huggingface_hub.
The text was updated successfully, but these errors were encountered: