Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-7058]: storage: fix race condition in segment::release_appender_in_background() #24483

Merged
merged 4 commits into from
Dec 13, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Dec 7, 2024

This function presented a race between segment::close() and a segment roll.

Consider the following sequence of events:

  1. _gate.close() called in segment::close()
  2. auto a = std::exchange(_appender, nullptr) called in segment::release_appender_in_background()
  3. ssx::spawn_with_gate() called in segment::release_appender_in_background()
  4. return ignore_shutdown_exceptions() in ssx::spawn_with_gate()
  5. rest of release_appender_in_background() is ignored
  6. a goes out of scope in release_appender_in_background() without close()ing the appender
  7. one sad panda:

segment_appender.cc:78) '_bytes_flush_pending == 0 && _closed' Must flush & close before deleting {no_of_chunks:64, closed:0, fallocation_offset:33554432, stable_offset:5895, flushed_offset:5895, committed_offset:5895, inflight:0, dispatched:0, merged:0, bytes_flush_pending:0}"

Add an explicit check to _gate.is_closed() in release_appender_in_background() to defer the closing of the appender to segment::close() and avoid the potential race condition here.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fixes a bug in which a segment being rolled and closed could race, leading to a triggered vassert.

@WillemKauf WillemKauf requested review from dotnwat and andrwng December 7, 2024 01:13
@WillemKauf WillemKauf changed the title storage: fix race condition in segment::release_appender_in_background() [CORE-7058]: storage: fix race condition in segment::release_appender_in_background() Dec 7, 2024
src/v/storage/segment.cc Outdated Show resolved Hide resolved
@vbotbuildovich
Copy link
Collaborator

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59427#01939f0d-e5b3-4448-b9d1-51dae0694acb:

"rptest.tests.cluster_config_test.ClusterConfigLegacyDefaultTest.test_legacy_default_explicit_before_upgrade.wipe_cache=False"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59427

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cluster_config_test.py::ClusterConfigLegacyDefaultTest.test_legacy_default_explicit_before_upgrade@{"wipe_cache":false}

@WillemKauf WillemKauf force-pushed the segment_appender_race_fix branch from 1999901 to 1cb2bcc Compare December 9, 2024 20:59
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Add _gate.is_closed() early return path in segment::release_appender_in_background()
  • Modify disk_log_impl::force_roll() to use model::next_offset() instead of a possibly error-prone addition operation
  • Update storage_e2e_fixture::remove_segment_permanently() to also erase segment from log->segments() to avoid a double close()

@WillemKauf WillemKauf requested a review from andrwng December 9, 2024 21:01
andrwng
andrwng previously approved these changes Dec 9, 2024
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the description which fibers are performing which actions? for example

F1: _gate.close in segment::close
F2: auto a = std::exchange in segment::release_appender_in_background()
F1: ...

like one lingering question is that if the interleaving exists, does it have a race or should it exist at all? for instance, if the segment is being closed, where is the concurrency originating from that allow it to have other fibers operating on it.

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race seems plausible, but there doesn't seem to be enough information in the cover letter to retrace the path.

For example, assume that a fiber reached release_appender_in_background(readers_cache); via disk_log_impl::apply_segment_ms which invokes last->release_appender. This fiber would be holding _compaction_housekeeping_gate.

So where did segment::close() originate from. One option would be disk_log_impl::close. But the first thing it does is closes the _compaction_housekeeping_gate.

But this interleaving I don't think is possible. Either closing the gate blocks because the release is in progress, or the releasing fiber isn't allowed to enter the gate. Does that sound right?

In order to fully evaluate the fix it's useful to be able to understand how the race occurred.

Can you provide that information in the cover letter?

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Dec 11, 2024

So where did segment::close() originate from

From the logs we have here, it originates from disk_log_impl::remove_prefix_full_segments().

TRACE 2024-12-06 16:20:26,087 [shard 0:main] storage - disk_log_impl.cc:1667 - creating log appender for: {redpanda/controller/0}
INFO  2024-12-06 16:20:26,087 [shard 0:main] storage - disk_log_impl.cc:2546 - Removing "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log" (remove_prefix_full_segments)

Will add more detail to the cover letter after I finish sketching the sequence of events out further.

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Dec 11, 2024

  1. F1: prefix truncate {redpanda/controller/0} in disk_log_impl::do_truncate_prefix()
  2. F2: creating log appender for {redpanda/controller/0} in disk_log_impl::make_appender()
  3. F1: ptr->write_lock() in disk_log_impl::remove_prefix_full_segments() finally returns, we create a future for the result of disk_log_impl::remove_segment_permanently(), but we also drop our write_lock before returning the future/waiting for it to finish.
  4. F2: disk_log_appender::initialize() is then able to obtain a ptr->read_lock() while write_lock is dropped
  5. F1: Removing "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log" (remove_prefix_full_segments) in disk_log_impl::remove_segment_permanently()
  6. F1: Because of read_lock obtained in appender, we see: Segment has outstanding locks. Might take a while to close:/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log in disk_log_impl::remove_segment_permanently()
  7. F1: closing segment: /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log in segment::close()
  8. F1: in segment::close(), we co_await _gate.close()
  9. F3: From an unknown origin, we roll the segment and call segment::release_appender()
  10. F3: We reach segment::release_appender_in_background() because of outstanding read_lock, and with a closed gate, and we encounter the originally presented issue.

^ I think the above is close to the situation we are seeing here.

auto next_offset = offsets().dirty_offset + model::offset(1);
auto next_offset = model::next_offset(offsets().dirty_offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this related to the race?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the race, related to the test case I wrote in which I call force_roll() on a log that has nothing produced to it, offsets.dirty_offset() == -9223372036854775808, and the offset value -9223372036854775808 + 1 is going to trigger an assert in disk_log_impl::new_segment().

If, instead, we use model::next_offset(-9223372036854775808) == 0, we are okay 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, got it. i forgot that force_roll is test-only code. yolo!

…ound()`

This function caused race conditions between `segment::close()` and
a segment roll.

Consider the following sequence of events:

1. `_gate.close()` called in `segment::close()`
2. `auto a = std::exchange(_appender, nullptr)` called
    in `segment::release_appender_in_background()`
3. `ssx::spawn_with_gate()` called in `segment::release_appender_in_background()`
4. `return ignore_shutdown_exceptions()` in `ssx::spawn_with_gate()`
5. rest of `release_appender_in_background()` is ignored
6. `a` goes out of scope in `release_appender_in_background()` without
   `close()`ing the `appender`
7. one sad panda

Add an explicit check to `_gate.check()` in `release_appender_in_background()`
to throw an exception in case the gate is closed, and defer the closing of the
appender to `segment::close()` in order to avoid the potential race condition here.
If the dirty offset is default initialized to `int64_t::min()`
and not updated before a segment is force rolled, an assert will fail
in `disk_log_impl::new_segment()` for the offset being < 0.

Use `model::next_offset()` instead of simply adding `1` to the dirty offset
to avoid this case.
To test race conditions between `segment::close()` and a segment roll,
particularly one which goes through `release_appender_in_background()`.
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Change return statement on _gate.closed() to _gate.check() in release_appender_in_background()
  • Wrap log->force_roll() in it's own function with a try/catch block in storage_e2e_fixture_test.cc

auto next_offset = offsets().dirty_offset + model::offset(1);
auto next_offset = model::next_offset(offsets().dirty_offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, got it. i forgot that force_roll is test-only code. yolo!

@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
debug
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/availability_test.py

1 similar comment
@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
debug
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/availability_test.py

@dotnwat dotnwat merged commit e5baaf4 into redpanda-data:dev Dec 13, 2024
17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants