-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage] Make sky storage delete parallel #1990 #2058
[Storage] Make sky storage delete parallel #1990 #2058
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ksasi, and welcome to SkyPilot! Left a comment. Can we also include a test for deleting multiple buckets? Something like we have in test_smoke::TestStorageWithCredentials::test_new_bucket_creation_and_deletion
, but for multiple buckets?
sky/cli.py
Outdated
@@ -3220,8 +3220,9 @@ def storage_delete(names: List[str], all: bool): # pylint: disable=redefined-bu | |||
else: | |||
names = _get_glob_storages(names) | |||
|
|||
for name in names: | |||
sky.storage_delete(name) | |||
pool_obj = multiprocessing.Pool() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use subprocess_utils.run_in_parallel
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, subprocess_utils.run_in_parallel
makes more sense.
Also, a polite reminder to run |
@romilbhardwaj , as suggested I had added test for deleting multiple buckets along with other updates. |
This reverts commit c28e4ec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ksasi! The functionality works well - storage delete is much faster now! Left some comments for the test.
Tested (run the relevant ones):
|
timestamp = str(time.time()).replace('.', '') | ||
store_obj = storage_lib.Storage(name=f'sky-test-{timestamp}') | ||
storage_mult_obj.append(store_obj) | ||
yield storage_mult_obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way to get storage objects from yield_storage_object
here? That way deletion after test is handled by that code. If not, we should add deletion code after the yield statement here (refer to yield_storage_object
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are unable to use multiple yield statements in fixture as they are not supported. Hence, had to resort to this specific way of implementation. I am adding the deletion code after the yield statement as suggested.
…atch_storage_obj to account for multiple store types
tests/test_smoke.py
Outdated
# Staggering tests for each store type to prevent BucketDeleteError | ||
# exception due to concurrent delete all | ||
if store_type.value == 'S3': | ||
time.sleep(0) | ||
elif store_type.value == 'GCS': | ||
time.sleep(10) | ||
storage_obj_name = int( | ||
subprocess.check_output( | ||
["sky storage ls|wc -l"], | ||
shell=True).decode('utf-8').splitlines()[0]) - 1 | ||
while storage_obj_name > 0: | ||
time.sleep(5) | ||
storage_obj_name = int( | ||
subprocess.check_output( | ||
["sky storage ls|wc -l"], | ||
shell=True).decode('utf-8').splitlines()[0]) - 1 | ||
elif store_type.value == 'R2': | ||
time.sleep(20) | ||
storage_obj_name = int( | ||
subprocess.check_output( | ||
["sky storage ls|wc -l"], | ||
shell=True).decode('utf-8').splitlines()[0]) - 1 | ||
while storage_obj_name > 0: | ||
time.sleep(5) | ||
storage_obj_name = int( | ||
subprocess.check_output( | ||
["sky storage ls|wc -l"], | ||
shell=True).decode('utf-8').splitlines()[0]) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of staggering tests, we can use @pytest.mark.xdist_group('multiple_bucket_deletion')
to run all parameterized tests serially. Let's remove this staggering code and add @pytest.mark.xdist_group
to this test :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. I removed the staggering code and added @pytest.mark.xdist_group('multiple_bucket_deletion')
to this test.
…eletion test to run all parameterized tests serially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks @ksasi for adding this!
@romilbhardwaj Thanks for your guidance during this entire PR. |
This update parallelize sky storage deletion as raised in issue#1990
Tested (run the relevant ones):
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_multiple_buckets_creation_and_deletion
bash tests/backward_comaptibility_tests.sh