-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: fix segment materialisation race #13796
cloud_storage: fix segment materialisation race #13796
Conversation
dde1d87
to
1656069
Compare
1656069
to
fc1c08a
Compare
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37935#018ae0d0-e1bc-492d-bd8b-628d7ef1c1c4 |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37935#018ae0d0-e1bf-456c-b7d1-ebee89e6d16b |
In order to create a new materialised segment, one needs to grab units from `materialized_resources` first. This is an async operation. By the time units are acquired, said segment might have already been via a different code path, resulting in the assertion in `remote_partition::materialize_segment` triggering. `remote_partition::aborted_transactions` was particularly susceptible to this. This patch fixes the issue by checking for the existence of the segment and creating a segment (if needed) in the same scheduling task. Functionally, for the read path, nothing should change.
fc1c08a
to
3100f40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but might be nice to avoid changes to the non-transactional path if we can avoid it
@@ -163,7 +180,7 @@ remote_partition::borrow_result_t remote_partition::borrow_next_segment_reader( | |||
} | |||
if (iter == _segments.end()) { | |||
auto path = manifest.generate_segment_path(*mit); | |||
iter = materialize_segment(path, *mit, std::move(segment_unit)); | |||
iter = get_or_materialize_segment(path, *mit, std::move(segment_unit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have a separate call that expects callers to have made the segment check in the same task. Or maybe we can tweak the above find()
to use get_or_materialize_segment()
?
Just noting that this is seeking twice for the same offset with no scheduling points in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the offloading logic above should move in get_or_materialize_segment
, but I chose not to do it in this PR to keep the change less intrusive.
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37961#018ae173-4b0a-4ec4-945c-7129e98068e1 |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37961#018ae17e-9ec4-48a7-94d5-85ea1b264cfc |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37961#018ae173-4b05-4356-b2bf-851ca1428c9a |
Kinda hard to interpret with all the retries but failures are: |
Will trigger another run to see if the instability is caused by this change or if we go unlucky. |
/ci-repeat |
/backport v23.2.x |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37985#018ae246-dee9-4316-9c80-b67619bb0cd8 |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37985#018ae246-def1-4340-9215-5af5ca51f8eb |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/37985#018ae254-800a-46a3-8307-64a6adee4249 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
In order to create a new materialised segment, one needs to grab units
from
materialized_resources
first. This is an async operation. By thetime units are acquired, said segment might have already been via a
different code path, resulting in the assertion in
remote_partition::materialize_segment
triggering.remote_partition::aborted_transactions
was particularly susceptible tothis.
This patch fixes the issue by checking for the existence of the segment
and creating a segment (if needed) in the same scheduling task.
Functionally, for the read path, nothing should change.
Backports Required
Release Notes
Bug Fixes