Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection broken: error 104, ECONNRESET #12

Closed
jean-philippe-martin opened this issue Sep 21, 2017 · 11 comments
Closed

Connection broken: error 104, ECONNRESET #12

jean-philippe-martin opened this issue Sep 21, 2017 · 11 comments

Comments

@jean-philippe-martin
Copy link
Contributor

jean-philippe-martin commented Sep 21, 2017

I used gcsfs to read about 800 files and it failed for 3 of them with this error:

Connection broken: error("(104, \'ECONNRESET\')",)

The call stack looks like this:

  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 536, in open
    return GCSFile(self, path, mode, block_size)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 622, in __init__
    self.details = gcsfs.info(path)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 464, in info
    files = self.ls(path, True)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 379, in ls
    files = self._list_bucket(bucket)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 320, in _list_bucket
    out = self._call('get', 'b/{}/o/', bucket, maxResults=max_results, pageToken=next_page_token)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 299, in _call
    json=json)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 658, in send
    r.content
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 823, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 748, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: error("(104, \'ECONNRESET\')",)', error("(104, 'ECONNRESET')",))

From the outside it looks like sometimes GCS has some sort of transient errors and GCSFS ought to be able to survive those and retry.

@martindurant
Copy link
Member

797/800 isn't bad :)
Interesting that this seems to happen during ls. Yes, agree that there could be some retry logic for anything within _call. Would you be interested in implementing this and submitting a PR?

@jean-philippe-martin
Copy link
Contributor Author

I agree it's not that bad - especially since I think I opened a few more files, so it's closer to 1000.

Since you're open to it, yes, I can try my hand at a PR. Can't guarantee it'll come today though.

@martindurant
Copy link
Member

The try/retry should go around this line: https://github.com/dask/gcsfs/blob/master/gcsfs/core.py#L298

@martindurant
Copy link
Member

OK to close?

@jean-philippe-martin
Copy link
Contributor Author

Absolutely!

@yan-hic
Copy link
Contributor

yan-hic commented Nov 12, 2018

@martindurant can you reopen the issue ?

I was misled in thinking that error 104 was not caught as retriable but it is the logging of it that is confusing, especially when one looks at Stackdriver logging - we use gcsfs in Cloud Functions.

In particular:

        logger.exception("_call exception: %s", e)
            if retry == self.retries - 1:
                raise e
            if is_retriable(e):
                # retry
                continue
            raise e

If retriable, would we want to log as an exception ? Warning no good for us as this is mapped to Error in SD logging too.

I would remove the logging.exception and add a logging.info in the is_retriable block instead.

@martindurant
Copy link
Member

How about:

--- a/gcsfs/core.py
+++ b/gcsfs/core.py
@@ -467,12 +467,13 @@ class GCSFileSystem(object):
                 validate_response(r, path)
                 break
             except (HtmlError, RequestException, GoogleAuthError) as e:
-                logger.exception("_call exception: %s", e)
                 if retry == self.retries - 1:
+                    logger.exception("_call out of retries on exception: %s", e)
                     raise e
                 if is_retriable(e):
-                    # retry
+                    logger.debug("_call retrying after exception: %s", e)
                     continue
+                logger.exception("_call non-retriable exception: %s", e)
                 raise e
         try:

@yan-hic
Copy link
Contributor

yan-hic commented Nov 12, 2018

Very good ! The more - "switchable" - info the better !

@yan-hic
Copy link
Contributor

yan-hic commented Dec 28, 2018

@martindurant Can you reopen the issue to include your latest code proposal ?

@martindurant
Copy link
Member

@yan-hic
Copy link
Contributor

yan-hic commented Dec 28, 2018

Thx ! Had missed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants