-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
google.auth.exceptions.RefreshError
with excessive concurrent requests.
#71
Comments
Retry on requests failing due to `google.auth.exceptions.RefreshError`, partial resolution of fsspec#71.
Retry on requests failing due to `google.auth.exceptions.RefreshError`, partial resolution of fsspec#71.
When the scheduler farms out tasks to workers, they indeed all individually authenticate themselves (unless you explicitly pass a token, which is unsafe and generally still needs to be refreshed). This should only happen the first time, after which the existing instance should be reused. I am including this for information only. |
Can you think of a case where |
I haven't had a chance to read though the @martindurant to your comment above, do you mean "this should only happen the first time" as in "this in the currently implemented behavior of the system" or "this behavior should be implemented"? I believe your comment in pangeo-data/pangeo#112 using |
Is
|
gcsfs is used routinely with Dask, but does not guarantee thread-safety. Specifically, if you have the same set of parameters when instantiating (which would be true for your example), you only create one instance and share it, so only one auth request is sent. However, the underlying library Directory listings could also potentially fall out of sync, but the code aggressively purges the cache when writing, and in the dask scenario, listings are usually done just once in the client. |
gcsfs
propagates angoogle.auth.exceptions.RefreshError
when executing many concurrent requests from a single node using thegoogle_default
credentials class. This is likely due to repeated, excessive number of requests to the internal metadata service. This is a known bug of the external library at googleapis/google-auth-library-python#211.Anecdotally, I've primarily observed this in
dask.distributed
workers and believe this might occur due to the way GCSFiles are distributed. This primarily occurs when a large number of small files are being read from storage and many worker threads are performing concurrent reads. I believe theGCSFile
s serialized in dask tasks then each instantiate a separateGCSFilesystem
, resolve credentials and open a session.If this is the case it would be preferable to store a fixed set of
AuthenticatedSession
handles, ideally via cache on theGCSFilesystem
class, and dispatch to an auth-method-specific session in theGCSFilesystem._connect_*
connection functions.As a more specific solution,
google.auth.exceptions.RefreshError
or its base class should be added to the retrying exception list in _call, however this may mask legitimate authentication errors. The credentials should probably be "tested" via some call that does not retry this error during session initialization. This may be as simple as callingsession.credentials.refresh
or performing a single authenticated request after session initialization.The text was updated successfully, but these errors were encountered: