-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Disable soft delete policy when creating new default bucket. #31344
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,9 +32,11 @@ | |
import unittest | ||
import uuid | ||
|
||
import mock | ||
import pytest | ||
|
||
from apache_beam.io.filesystems import FileSystems | ||
from apache_beam.options.pipeline_options import GoogleCloudOptions | ||
from apache_beam.testing.test_pipeline import TestPipeline | ||
|
||
try: | ||
|
@@ -141,6 +143,44 @@ def test_batch_copy_and_delete(self): | |
self.assertFalse( | ||
result[1], 're-delete should not throw error: %s' % result[1]) | ||
|
||
@pytest.mark.it_postcommit | ||
@mock.patch('apache_beam.io.gcp.gcsio.default_gcs_bucket_name') | ||
def test_create_default_bucket(self, mock_default_gcs_bucket_name): | ||
google_cloud_options = self.test_pipeline.options.view_as( | ||
GoogleCloudOptions) | ||
# overwrite kms option here, because get_or_create_default_gcs_bucket() | ||
# requires this option unset. | ||
google_cloud_options.dataflow_kms_key = None | ||
|
||
import random | ||
from hashlib import md5 | ||
# Add a random number to avoid collision if multiple test instances | ||
# are run at the same time. To avoid too many dangling buckets if bucket | ||
# removal fails, we limit the max number of possible bucket names in this | ||
# test to 1000. | ||
overridden_bucket_name = 'gcsio-it-%d-%s-%s' % ( | ||
random.randint(0, 999), | ||
google_cloud_options.region, | ||
md5(google_cloud_options.project.encode('utf8')).hexdigest()) | ||
|
||
mock_default_gcs_bucket_name.return_value = overridden_bucket_name | ||
|
||
# remove the existing bucket with the same name as the default bucket | ||
existing_bucket = self.gcsio.get_bucket(overridden_bucket_name) | ||
if existing_bucket: | ||
existing_bucket.delete() | ||
|
||
bucket = gcsio.get_or_create_default_gcs_bucket(google_cloud_options) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just realize, in case upstream code changed and the mock no longer effective, the following will delete the default bucket. We should assert that the created bucket is the one that with injected name, thus guard from deleting the real default bucket There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. Added the check. PTAL |
||
self.assertIsNotNone(bucket) | ||
self.assertEqual(bucket.name, overridden_bucket_name) | ||
|
||
# verify soft delete policy is disabled by default in the default bucket | ||
# after creation | ||
self.assertEqual(bucket.soft_delete_policy.retention_duration_seconds, 0) | ||
bucket.delete() | ||
|
||
self.assertIsNone(self.gcsio.get_bucket(overridden_bucket_name)) | ||
|
||
|
||
if __name__ == '__main__': | ||
logging.getLogger().setLevel(logging.INFO) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't sounds quite right that a PostCommit needs a mock. And this mock isn't mock a fake service, it's used to override nomenclature of temp bucket. What happens if we do not hack it?
Also, this test does not run a pipeline, should we configure it only run on test-suites:direct:py3xx:postCommitIT. Persumably currently it is running on Dataflow PostCommit IT suites which is not quite right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @Abacn. Below are my responses.
For a given project, the function of
default_gcs_bucket_name
will return a fixed bucket name as the default. If we don't override this, we need to create a particular project (other than using apache-beam-testing or whatever project the users want to provide during running this test) to test this. Per the offline discussion with @damccorm, it seems a bit overkill to create a project and then remove it afterward for this test. I think using mocking is kind of a "hack" but the code is clean. I am open to any better suggestion though.If you look at the other tests under
gcsio_integration_test.py
, they are also testing the gcsio functionality with an actual gcs operation. However, they don't trigger any pipeline running either.