-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent file overwrite by manage-study (SCP-2698) #109
Conversation
handle 422 response from server bugfix: propagate cluster-name in cluster file upload
Codecov Report
@@ Coverage Diff @@
## master #109 +/- ##
==========================================
+ Coverage 22.36% 23.01% +0.65%
==========================================
Files 17 17
Lines 2835 2876 +41
==========================================
+ Hits 634 662 +28
- Misses 2201 2214 +13
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice robustness improvement :). I suggested a non-blocking refinement.
82 exit-failed-to-gsutil-delete-file | ||
83 exit-uploaded-file-deleted | ||
84 exit-no-file-cleanup-needed | ||
85 exit-file-not-found-in-remote-bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't consider it blocking, but now or later we should refactor to use Python errors instead of Bash errors.
As is, upon encountering a state we consider an error, this approach does:
print(specific_error_message)
exit(specific_error_code)
That complicates future downstream error handling in Python, e.g. logging the error to Sentry or Mixpanel. Instead, raising a Python error (like we do in Ingest Pipeline here) would ease writing to terminal, log file, and external services like Sentry or Mixpanel. If custom exit codes are indeed needed, that can be done in Python error handling as shown here.
The current approach is fine for now (it doesn't cause any functional issues), but if you prefer to refactor later then please open a tech debt ticket and note this as a TODO in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added SCP-2790 to backlog for future refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but I'm a little confused about the dual implementation of exists_in_bucket
scripts/scp_api.py
Outdated
@@ -99,6 +99,12 @@ | |||
cmdline = Commandline.Commandline() | |||
|
|||
|
|||
def exists_in_bucket(bucket_file_path, mute=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method implemented twice, but once as a @staticmethod
? If they have different functions (one uses gsutil stat
and another uses gsutil ls
), then I don't understand why they would have the same method name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! Eno helped me refactor so my code to make exists_in_bucket a staticmethod in the Class - didn't realize I had neglected to delete the original... deleted in 03633e1
switch from bash to python errors
Currently manage-study uploads files before performing ingest without checking if a file of the same name already exists in the study bucket. manage-study should not overwrite existing, valid study files.
With this update, manage-study handles file upload more appropriately with the follow behaviors:
This fulfills SCP-2698.