-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdks/python: enable recursive deletion for GCSFileSystem Paths #33611
sdks/python: enable recursive deletion for GCSFileSystem Paths #33611
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
1ea7e3b
to
01a4b65
Compare
Assigning reviewers. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Hi @liferoad, It seems the test cases passed now. I would love to receive any feedback on that PR! 🙏 |
Ack. Thanks for contributing to Beam. I will review the code today. |
Hi @shunping, I understand you've been busy last week. Any updates on this PR review? 🙏 |
Sorry for the late reply, @mohamedawnallah. I left a comment about potential performance impact with the new change. |
Reminder, please take a look at this pr: @tvalentyn @shunping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the changes and I am happy to review them again when ready.
991ca70
to
661f822
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #33611 +/- ##
============================================
+ Coverage 57.45% 57.46% +0.01%
Complexity 1474 1474
============================================
Files 980 984 +4
Lines 155541 155804 +263
Branches 1076 1076
============================================
+ Hits 89360 89539 +179
- Misses 63973 64057 +84
Partials 2208 2208
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think some changes are still needed in gcsfilesystem.py.
In this commit, we enable recursive deletion for GCS (Google Cloud Storage) paths, including directories and blobs. Changes include: - Updated the `delete` method to support recursive deletion of GCS directories (prefixes). - If the path points to a directory, all blobs under that prefix are deleted. - Refactored logic to handle both single blob and directory deletion cases.
In this commit, we update the delete test to verify recursive deletion of directories (prefixes) in GCS. Changes include: - Added test for deleting a GCS directory (prefix) with multiple files. - Verified files under a directory are deleted recursively when using the delete method.
661f822
to
3491c70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks again!
Hi @shunping, Thanks for your follow-up review! I've addressed your feedback. This might be unrelated, but I’d like to use this PR as an opportunity to learn more about Beam development practices as I am getting used to it. My interest was sparked by this discussion: #33672 (comment). When addressing a review and make a follow-up commit to fix the issue, Someone should avoid squashing reviewed and unreviewed commits. After the follow-up review is completed, would it be necessary to squash the follow-up commit (I am thinking about delay in the merging process e.g for the CI to be triggered again), or is it acceptable to leave the follow-up commit as part of the commit history? |
The root problem here is about modifying an "already pushed" commit. In your previous PR and this one as well, you pushed your code to github and then you did an operation locally to modify your commit history (eg. squashing the commits or simply amending an already pushed commit). You may notice that, when you pushed your code again, git reminded you to do a "forced" push, which is then shown up in github as something like:
This is not a good practice for code that is already in the reviewing pipeline. Specifically, when you force-push a commit (which changes the original commit sha1), github can no longer associate it with the reviews that was previously submitted to it. As a result, it will be inconvenient for reviewers to go back and forth to compare what you had changed and what were the previous feedback. Check the following pages for some more reading. |
Can we add this to CHANGES.md? |
Good point. @mohamedawnallah, could you add a line under "New Features/Improvement" in https://github.com/apache/beam/blob/master/CHANGES.md to mention your change? The format is like the following
|
Thank you! |
Description
In this PR, we enable recursive deletion for Google Cloud Storage (GCS) paths, including directories and blobs. The
delete
method is updated to remove all blobs under a directory (prefix) when deleting GCS directories.Additionally, we update the delete test to verify the recursive deletion of directories containing multiple files.
Changes include:
delete
method to support recursive directory deletion.Fixes #27605
Current Behavior
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.