Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup tiered storage temporary cache file if exceptions are thrown during download #24000

Merged
merged 6 commits into from
Nov 7, 2024

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Nov 4, 2024

See CORE-8113

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Cleanup tiered storage temporary cache file if exceptions are thrown during download.

co_await delete_file_and_empty_parents(
(dir_path / tmp_filename).native());
} catch (...) {
vlog(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still leaving a file here which is untracked but if this fails I believe the system has worse problems to deal with. We could commit the reservation but I don't think the code complexity is worth it.


throw disk_full_error;
std::rethrow_exception(eptr);
}

// commit write transaction
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still throw below and leave the file untracked on disk. I think it is fine though. If getting file size and renaming fails we have worse problems to deal with.

auto put_size = co_await ss::file_size(src);
    co_await ss::rename_file(src, dest);
    // We will now update
    reservation.wrote_data(put_size, 1);

Copy link
Contributor Author

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review.

WillemKauf
WillemKauf previously approved these changes Nov 4, 2024
Copy link
Contributor

@WillemKauf WillemKauf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice RCA, test, and fix!

src/v/cloud_storage/tests/cache_test.cc Show resolved Hide resolved
src/v/cloud_storage/cache_service.cc Outdated Show resolved Hide resolved
src/v/cloud_storage/tests/cache_test.cc Show resolved Hide resolved
WillemKauf
WillemKauf previously approved these changes Nov 4, 2024
@vbotbuildovich
Copy link
Collaborator

the below tests from https://buildkite.com/redpanda/redpanda/builds/57550#0192f8da-b7c5-4233-9e0a-631773ce65ea have failed and will be retried

catalog_schema_manager_rpunit

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 5, 2024

@nvartolomei
Copy link
Contributor Author

/cdt
rp_version=pr

@nvartolomei
Copy link
Contributor Author

/cdt
rp_version=pr
tests/rptest/tests/partition_move_interruption_test.py

@nvartolomei
Copy link
Contributor Author

/cdt
tests/rptest/tests/partition_move_interruption_test.py

Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though +1 to Evgeny's comment about shutdown errors

src/v/cloud_storage/cache_service.cc Show resolved Hide resolved
src/v/cloud_storage/cache_service.cc Outdated Show resolved Hide resolved
@nvartolomei nvartolomei enabled auto-merge November 7, 2024 07:28
@nvartolomei nvartolomei merged commit 969693b into redpanda-data:dev Nov 7, 2024
16 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24000-v24.1.x-403 remotes/upstream/v24.1.x
git cherry-pick -x 6e1e998021 be77f709bc 5a93114f8a 50fef29c79 b34aab6da1 e90eafbbfb

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24000-v24.2.x-215 remotes/upstream/v24.2.x
git cherry-pick -x 6e1e998021 be77f709bc 5a93114f8a 50fef29c79 b34aab6da1 e90eafbbfb

Workflow run logs.

Comment on lines +1316 to +1321
&& !ssx::is_shutdown_exception(delete_tmp_fut.get_exception())) {
vlog(
cst_log.error,
"Failed to delete tmp file {}: {}",
tmp_filepath.native(),
delete_tmp_fut.get_exception());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvartolomei I don't think it is safe to call get_exception twice. from what i can tell it moves the _state out of the future (see future::get_available_state_ref)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix #24189

@nvartolomei nvartolomei deleted the nv/CORE-8113 branch November 20, 2024 01:17
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Nov 20, 2024
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Nov 20, 2024
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants