[CORE-7058]: `storage`: fix race condition in `segment::release_appender_in_background()` #24483

WillemKauf · 2024-12-07T01:13:52Z

This function presented a race between segment::close() and a segment roll.

Consider the following sequence of events:

_gate.close() called in segment::close()
auto a = std::exchange(_appender, nullptr) called in segment::release_appender_in_background()
ssx::spawn_with_gate() called in segment::release_appender_in_background()
return ignore_shutdown_exceptions() in ssx::spawn_with_gate()
rest of release_appender_in_background() is ignored
a goes out of scope in release_appender_in_background() without close()ing the appender
one sad panda:

segment_appender.cc:78) '_bytes_flush_pending == 0 && _closed' Must flush & close before deleting {no_of_chunks:64, closed:0, fallocation_offset:33554432, stable_offset:5895, flushed_offset:5895, committed_offset:5895, inflight:0, dispatched:0, merged:0, bytes_flush_pending:0}"

Add an explicit check to _gate.is_closed() in release_appender_in_background() to defer the closing of the appender to segment::close() and avoid the potential race condition here.

Backports Required

Release Notes

Bug Fixes

Fixes a bug in which a segment being rolled and closed could race, leading to a triggered vassert.

src/v/storage/segment.cc

vbotbuildovich · 2024-12-07T04:42:58Z

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59427#01939f0d-e5b3-4448-b9d1-51dae0694acb:

"rptest.tests.cluster_config_test.ClusterConfigLegacyDefaultTest.test_legacy_default_explicit_before_upgrade.wipe_cache=False"

vbotbuildovich · 2024-12-07T04:43:11Z

Retry command for Build#59427

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cluster_config_test.py::ClusterConfigLegacyDefaultTest.test_legacy_default_explicit_before_upgrade@{"wipe_cache":false}

vbotbuildovich · 2024-12-07T09:22:29Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59427#01939fe0-ce7f-4dc5-b353-d352e6575a7a
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59504#0193ad96-74d4-43d4-88ed-6cb4b8f62167
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59622#0193b857-9939-48e4-82be-17d77219ef01
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59622#0193b857-9936-4364-9140-dafe30d38566

WillemKauf · 2024-12-09T21:00:58Z

Force push to:

Add _gate.is_closed() early return path in segment::release_appender_in_background()
Modify disk_log_impl::force_roll() to use model::next_offset() instead of a possibly error-prone addition operation
Update storage_e2e_fixture::remove_segment_permanently() to also erase segment from log->segments() to avoid a double close()

dotnwat

Can you add to the description which fibers are performing which actions? for example

F1: _gate.close in segment::close
F2: auto a = std::exchange in segment::release_appender_in_background()
F1: ...

like one lingering question is that if the interleaving exists, does it have a race or should it exist at all? for instance, if the segment is being closed, where is the concurrency originating from that allow it to have other fibers operating on it.

dotnwat

The race seems plausible, but there doesn't seem to be enough information in the cover letter to retrace the path.

For example, assume that a fiber reached release_appender_in_background(readers_cache); via disk_log_impl::apply_segment_ms which invokes last->release_appender. This fiber would be holding _compaction_housekeeping_gate.

So where did segment::close() originate from. One option would be disk_log_impl::close. But the first thing it does is closes the _compaction_housekeeping_gate.

But this interleaving I don't think is possible. Either closing the gate blocks because the release is in progress, or the releasing fiber isn't allowed to enter the gate. Does that sound right?

In order to fully evaluate the fix it's useful to be able to understand how the race occurred.

Can you provide that information in the cover letter?

WillemKauf · 2024-12-11T00:18:49Z

So where did segment::close() originate from

From the logs we have here, it originates from disk_log_impl::remove_prefix_full_segments().

TRACE 2024-12-06 16:20:26,087 [shard 0:main] storage - disk_log_impl.cc:1667 - creating log appender for: {redpanda/controller/0}
INFO  2024-12-06 16:20:26,087 [shard 0:main] storage - disk_log_impl.cc:2546 - Removing "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log" (remove_prefix_full_segments)

Will add more detail to the cover letter after I finish sketching the sequence of events out further.

WillemKauf · 2024-12-11T00:57:16Z

F1: prefix truncate {redpanda/controller/0} in disk_log_impl::do_truncate_prefix()
F2: creating log appender for {redpanda/controller/0} in disk_log_impl::make_appender()
F1: ptr->write_lock() in disk_log_impl::remove_prefix_full_segments() finally returns, we create a future for the result of disk_log_impl::remove_segment_permanently(), but we also drop our write_lock before returning the future/waiting for it to finish.
F2: disk_log_appender::initialize() is then able to obtain a ptr->read_lock() while write_lock is dropped
F1: Removing "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log" (remove_prefix_full_segments) in disk_log_impl::remove_segment_permanently()
F1: Because of read_lock obtained in appender, we see: Segment has outstanding locks. Might take a while to close:/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log in disk_log_impl::remove_segment_permanently()
F1: closing segment: /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log in segment::close()
F1: in segment::close(), we co_await _gate.close()
F3: From an unknown origin, we roll the segment and call segment::release_appender()
F3: We reach segment::release_appender_in_background() because of outstanding read_lock, and with a closed gate, and we encounter the originally presented issue.

^ I think the above is close to the situation we are seeing here.

dotnwat · 2024-12-11T17:32:12Z

src/v/storage/disk_log_impl.cc

-    auto next_offset = offsets().dirty_offset + model::offset(1);
+    auto next_offset = model::next_offset(offsets().dirty_offset);


how is this related to the race?

Unrelated to the race, related to the test case I wrote in which I call force_roll() on a log that has nothing produced to it, offsets.dirty_offset() == -9223372036854775808, and the offset value -9223372036854775808 + 1 is going to trigger an assert in disk_log_impl::new_segment().

If, instead, we use model::next_offset(-9223372036854775808) == 0, we are okay 👍

ahh, got it. i forgot that force_roll is test-only code. yolo!

…ound()` This function caused race conditions between `segment::close()` and a segment roll. Consider the following sequence of events: 1. `_gate.close()` called in `segment::close()` 2. `auto a = std::exchange(_appender, nullptr)` called in `segment::release_appender_in_background()` 3. `ssx::spawn_with_gate()` called in `segment::release_appender_in_background()` 4. `return ignore_shutdown_exceptions()` in `ssx::spawn_with_gate()` 5. rest of `release_appender_in_background()` is ignored 6. `a` goes out of scope in `release_appender_in_background()` without `close()`ing the `appender` 7. one sad panda Add an explicit check to `_gate.check()` in `release_appender_in_background()` to throw an exception in case the gate is closed, and defer the closing of the appender to `segment::close()` in order to avoid the potential race condition here.

If the dirty offset is default initialized to `int64_t::min()` and not updated before a segment is force rolled, an assert will fail in `disk_log_impl::new_segment()` for the offset being < 0. Use `model::next_offset()` instead of simply adding `1` to the dirty offset to avoid this case.

To test race conditions between `segment::close()` and a segment roll, particularly one which goes through `release_appender_in_background()`.

WillemKauf · 2024-12-11T23:07:18Z

Force push to:

Change return statement on _gate.closed() to _gate.check() in release_appender_in_background()
Wrap log->force_roll() in it's own function with a try/catch block in storage_e2e_fixture_test.cc

dotnwat · 2024-12-11T23:45:51Z

src/v/storage/disk_log_impl.cc

-    auto next_offset = offsets().dirty_offset + model::offset(1);
+    auto next_offset = model::next_offset(offsets().dirty_offset);


ahh, got it. i forgot that force_roll is test-only code. yolo!

WillemKauf · 2024-12-12T13:43:42Z

/ci-repeat 5
debug
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/availability_test.py

WillemKauf · 2024-12-12T18:01:53Z

/ci-repeat 5
debug
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/availability_test.py

vbotbuildovich · 2024-12-13T00:08:40Z

/backport v24.3.x

vbotbuildovich · 2024-12-13T00:08:41Z

/backport v24.2.x

WillemKauf requested review from dotnwat and andrwng December 7, 2024 01:13

github-actions bot added the area/redpanda label Dec 7, 2024

WillemKauf changed the title ~~storage: fix race condition in segment::release_appender_in_background()~~ [CORE-7058]: storage: fix race condition in segment::release_appender_in_background() Dec 7, 2024

andrwng reviewed Dec 7, 2024

View reviewed changes

src/v/storage/segment.cc Outdated Show resolved Hide resolved

WillemKauf force-pushed the segment_appender_race_fix branch from 1999901 to 1cb2bcc Compare December 9, 2024 20:59

WillemKauf requested a review from andrwng December 9, 2024 21:01

andrwng previously approved these changes Dec 9, 2024

View reviewed changes

dotnwat reviewed Dec 10, 2024

View reviewed changes

dotnwat reviewed Dec 11, 2024

View reviewed changes

WillemKauf added 4 commits December 11, 2024 18:05

storage: move storage_e2e_fixture to its own header file

c0d76d1

storage: add test_concurrent_segment_roll_and_close

24ec834

To test race conditions between `segment::close()` and a segment roll, particularly one which goes through `release_appender_in_background()`.

WillemKauf dismissed andrwng’s stale review via 24ec834 December 11, 2024 23:06

WillemKauf force-pushed the segment_appender_race_fix branch from 1cb2bcc to 24ec834 Compare December 11, 2024 23:06

dotnwat approved these changes Dec 11, 2024

View reviewed changes

dotnwat merged commit e5baaf4 into redpanda-data:dev Dec 13, 2024
17 checks passed

This was referenced Dec 13, 2024

[v24.2.x] [CORE-7058]: storage: fix race condition in segment::release_appender_in_background() #24559

Merged

[v24.3.x] [CORE-7058]: storage: fix race condition in segment::release_appender_in_background() #24560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-7058]: `storage`: fix race condition in `segment::release_appender_in_background()` #24483

[CORE-7058]: `storage`: fix race condition in `segment::release_appender_in_background()` #24483

WillemKauf commented Dec 7, 2024 •

edited by dotnwat

Loading

vbotbuildovich commented Dec 7, 2024

vbotbuildovich commented Dec 7, 2024

vbotbuildovich commented Dec 7, 2024 •

edited

Loading

WillemKauf commented Dec 9, 2024

dotnwat left a comment

dotnwat left a comment •

edited

Loading

WillemKauf commented Dec 11, 2024 •

edited

Loading

WillemKauf commented Dec 11, 2024 •

edited

Loading

dotnwat Dec 11, 2024

WillemKauf Dec 11, 2024

dotnwat Dec 11, 2024

WillemKauf commented Dec 11, 2024

dotnwat Dec 11, 2024

WillemKauf commented Dec 12, 2024

WillemKauf commented Dec 12, 2024

vbotbuildovich commented Dec 13, 2024

vbotbuildovich commented Dec 13, 2024

		auto next_offset = offsets().dirty_offset + model::offset(1);
		auto next_offset = model::next_offset(offsets().dirty_offset);

[CORE-7058]: storage: fix race condition in segment::release_appender_in_background() #24483

[CORE-7058]: storage: fix race condition in segment::release_appender_in_background() #24483

Conversation

WillemKauf commented Dec 7, 2024 • edited by dotnwat Loading

Backports Required

Release Notes

Bug Fixes

vbotbuildovich commented Dec 7, 2024

vbotbuildovich commented Dec 7, 2024

Retry command for Build#59427

vbotbuildovich commented Dec 7, 2024 • edited Loading

WillemKauf commented Dec 9, 2024

dotnwat left a comment

Choose a reason for hiding this comment

dotnwat left a comment • edited Loading

Choose a reason for hiding this comment

WillemKauf commented Dec 11, 2024 • edited Loading

WillemKauf commented Dec 11, 2024 • edited Loading

dotnwat Dec 11, 2024

Choose a reason for hiding this comment

WillemKauf Dec 11, 2024

Choose a reason for hiding this comment

dotnwat Dec 11, 2024

Choose a reason for hiding this comment

WillemKauf commented Dec 11, 2024

dotnwat Dec 11, 2024

Choose a reason for hiding this comment

WillemKauf commented Dec 12, 2024

WillemKauf commented Dec 12, 2024

vbotbuildovich commented Dec 13, 2024

vbotbuildovich commented Dec 13, 2024

[CORE-7058]: `storage`: fix race condition in `segment::release_appender_in_background()` #24483

[CORE-7058]: `storage`: fix race condition in `segment::release_appender_in_background()` #24483

WillemKauf commented Dec 7, 2024 •

edited by dotnwat

Loading

vbotbuildovich commented Dec 7, 2024 •

edited

Loading

dotnwat left a comment •

edited

Loading

WillemKauf commented Dec 11, 2024 •

edited

Loading

WillemKauf commented Dec 11, 2024 •

edited

Loading