Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does not handle 502 errors (returned by GCS when server is overloaded) #31

Closed
yan-hic opened this issue Nov 14, 2017 · 8 comments
Closed

Comments

@yan-hic
Copy link
Contributor

yan-hic commented Nov 14, 2017

I have faced several times the same 502 error when writing in chunks. Errors are sporadic and look like:

  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 896, in __exit__
    self.close()
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 868, in close
    self.flush(force=True)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 755, in flush
    self._upload_chunk(final=force)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 777, in _upload_chunk
    validate_response(r, self.location)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 111, in validate_response
    raise RuntimeError(m)
RuntimeError: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That's an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That's all we know.</ins>

Exception ValueError: ValueError('Force flush cannot be called more than once',) in <bound method GCSFile.__del__ of <GCSFile tempfile.txt>> ignored

Google writes this may happen when system/network is under stress - our script does indeed read and write from GCS several GB of files in chunks after on-the-fly transformations.
Look at Handling Errors at https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload.
They indicate to implement exponential-backoff: https://cloud.google.com/storage/docs/exponential-backoff

Is this something that can be implemented in GCSFS i.e. resubmit a chunk when such 5xx error occurs ?

If not, wrapping the write statement in a try/exception, delete the target file written so far and reprocess is not an option for me. Because the records I write are based off data in a compressed GCSFS-file I read. ZipFileExt are not seekable so what is read is gone. I would have to delete the whole batch and restart.

@martindurant
Copy link
Member

We apparently skip 502 in https://github.com/dask/gcsfs/blob/master/gcsfs/utils.py#L116 - feel free to add. Actually, any >=500 may be reasonable to add here, although I'm not sure what other errors are possible. As the message suggests, though, retrying immediately may not be the right thing to do.

@martindurant
Copy link
Member

To solve the ValueError('Force flush cannot be called more than once',), could you try moving this block to the end of that method, after self._upload_chunk() ?

@yan-hic
Copy link
Contributor Author

yan-hic commented Nov 14, 2017

@martindurant how many lines in the above block did you mean to move: this one or less ?

@yan-hic yan-hic closed this as completed Nov 14, 2017
@yan-hic yan-hic reopened this Nov 14, 2017
@martindurant
Copy link
Member

@yan-hic
Copy link
Contributor Author

yan-hic commented Nov 18, 2017

code move helped get a more explanatory message for other error #33

@bachsh
Copy link

bachsh commented Apr 6, 2019

This issue is marked as closed but the relevant change has been overwritten and now the retriable error codes are [500, 505]. Can anyone say why 502 was removed from the list of retriable errors?

@yan-hic
Copy link
Contributor Author

yan-hic commented Apr 6, 2019

@bachsh what makes you say it's removed ? The code is list(range(500, 505)) so all codes between 500 and 505.
Did you hit a 502 error ? If so, remember it retries only a few times.

@bachsh
Copy link

bachsh commented Apr 6, 2019

whoops, what a n00b :)

yes, we are hitting 502 errors when saving using gcsfs. It makes sense to retry only a few times. Thanks for the answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants