-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max retries exceeded with url: /o/oauth2/token #91
Comments
Hm, not too much to go on there, except that it's clearly trying to re-authenticate. I wonder if we could be doing a better job of caching the GCSFileSystem instances in a given worker, or if this is just a too-many-concurrent-requests kind of thing. In any case, I would first suggest trying to throttle the number of workers that are writing, to see if that helps. |
Can I accomplish that using a write lock? |
Certainly, but then you would loose parallelism. Perhaps Variable would allow you to limit the number of workers/threads (@mrocklin , suggestions?) |
I tried this with zarr's thread synchronizer to prevent simulataneous writes to GCS. No luck, same errors. I am still stuck with issue and unable to move forward. I am also seeing these errors on my worker logs
I guess it could be related. |
I wonder, would it be useful to provide an insecure token mode, i.e., where the actual access token is passed to all the instances, rather than using local renew tokens which cause the calls to the /token/ endpoint? I call this insecure, since the tokens would be passed in open channels, but this is not an issue within the isolated network of kubernetes. I think the following should do it: you should set up a gcsfs instance, and perform any operation on it (the first operation will cause the token refresh) and then
in storage parameters (be sure to also explicitly give the project when you do this). |
Was this issue completely resolved? I've been running into this exact problem when moving very large datasets (~1TB). Reducing Dask cluster size seems to help. I am using,
As mentioned by @martindurant above. |
No, I don't think we have a concrete solution, the problem comes from some sort of rate limit accessing the google metadata service. |
I ran into the same problem (in multi-process CloudFiles). I found this stack overflow that says it could also be too many open file descriptors (i.e. network connections), but I think you are probably right that it's a Google rate limit. https://stackoverflow.com/questions/15286288/what-does-this-python-requests-error-mean I wonder if it would be possible to let these connections share the DNS / auth information. |
I am trying to push a very large dataset to gcs via the xarray / zarr / gcsfs / dask stack. I have encountered a new error at the gcsfs level.
Here's a summary of what I am doing
I'm doing this via a distributed client connected to a local multithreaded cluster.
There are almost a million tasks in the graph. It will generally get about 5% in and then hit some sort of intermittent, non-reproducible error.
This is the error I have now.
The text was updated successfully, but these errors were encountered: