This repository has been archived by the owner on Dec 9, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Apply retries around the whole download operation #44
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… and then read from that. By mimicking the way the rest of this class works we'll be able to reuse code better in the following steps
aleon
approved these changes
Jan 22, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I am still very much a n00b
to Go so very much worth more 👀
I have completed the testing on staging of the new |
alexstoick
approved these changes
Jan 27, 2020
me
approved these changes
Jan 27, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ticket: https://deliveroo.atlassian.net/browse/NSA-500
What
Change the retry logic in
paddle data get
so it covers additional transient failure cases that are worth retrying.Why
NSA have seen some failures in our pipelines because copying the data from S3 into a local file failed for a transient reason. Before this PR the cli had retries around asking s3 for the object (
s3.GetObject(input *s3.GetObjectInput) (*s3.GetObjectOutput, error)
) but not around actually pulling the bytes across the network and copying them into the file locally (io.Copy(foo, s3.GetObjectOutput.Body)
).This led to failures in our pipelines because we failed when copying the object into a local file which weren't retried.
This PR moves the retries to be around both connecting to s3 and copying the data locally.
How
Firstly I refactored how the class fetches files and the special case of fetching the
HEAD
file. Previously files were fetched as you would expect but theHEAD
file got the contents and copied them directly into astring
. The retry was in getting the contents so if I moved the retry further up the stack for fetching files thereadHEAD
method would no longer benefit from the retries. So I changedreadHEAD
to copy the object into a temporary file and then read the contents. This unified the code paths and made it easier to refactor.This then gives a method called
copyS3ObjectToFile
which is used by both code paths and is easier to test in isolation. I moved the retry logic into this method so both connecting to s3 and copying the bytes into the file are now inside the retry loop. We have to reset the file if we fail to copy because some bytes may already be written to the file.Testing
Previously there were no tests for reading files from S3. I've now added tests for the happy case (read successfully first time), failures at each step and success after a number of retries. I made the time between retries configurable so these tests don't take too long.
The tests work by pulling out a structural type
S3Getter
which encapsulates theGetObject(input *s3.GetObjectInput) (*s3.GetObjectOutput, error)
method we need from the full aws client.I have not yet tested this on a pipeline but will do this once I've got feedback on this PR.
Where to start
Only covers 2 classes. I'm not familiar with Go so the individual commits are very small, not sure if it is useful to follow them.
Open Questions
This is my first time writing Go so I may have made some dumb mistakes, missed some obvious library code that could have helped or written code in a style that isn't idiomatic.
This class also lists the keys with a given prefix - should I add retries around this operation too? We haven't seen it fail yet but it could.
Urgency
Not urgent