-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in storage::segment_reader_handle::close() from tiered storage #18213
Labels
area/cloud-storage
Shadow indexing subsystem
kind/bug
Something isn't working
sev/high
loss of availability, pathological performance degradation, recoverable corruption
Comments
andrwng
added
kind/bug
Something isn't working
area/cloud-storage
Shadow indexing subsystem
sev/high
loss of availability, pathological performance degradation, recoverable corruption
labels
May 2, 2024
andrwng
added a commit
to andrwng/redpanda
that referenced
this issue
May 2, 2024
It's possible that when a client disconnects, the abort source subscription associated with the client connection ends up evicting the current segment reader. 1. Remote partition reader is in the process of reading a segment 2. The read eventually scans past the LSO, and stops. This stop() is asynchonous, closing IO streams, etc. 3. The Kafka client disconnects, triggering the abort source subscription callback to call evict_segment_reader(), moving the segment reader to the remote partition's eviction list 4. The remote partition's eviction loop sees this reader and runs stop(), and destructs the underlying parser 5. The read fiber, still stopping, resumes control, but the underlying parser is destructed This is roughly the sequence of events seen below. ``` DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::raft_configuration DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1398 DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1398-1398], current state: {{base offset/delta: {1397}/1221, map size: 1, last delta: 1221}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1399 DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1399-1400], current state: {{base offset/delta: {1397}/1221, map size: 2, last delta: 1222}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1401 DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1401-1401], current state: {{base offset/delta: {1397}/1221, map size: 3, last delta: 1224}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::tx_fence DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1402 DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1185 - [{{0.0.0.0:0}}] accept_batch_start stop parser because 1403 > 177(kafka offset) DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:435 - No results, current rp offset: 1403, current kafka offset: 178, max rp offset: 177 DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close DEBUG 2024-05-02 14:55:06,234 [shard 2:main] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:247 - abort requested via config.abort_source DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41665 defa47c0/kafka/topic/0_18762/1398-1405-846-134-v1.log.137] - segment_chunk_data_source.cc:48 - chunk data source destroyed DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close ERROR 2024-05-02 14:55:06,235 [shard 2:fetc] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-009787d74859cb582-1/redpanda/redpanda/src/v/storage/segment_reader.cc:210) '_parent == nullptr' Must close before destroying ``` Note, the tx_fence is assigned a Kafka offset 178, but isn't reflected by the LSO-1=177 calculation used as an upper bound for the log reader config, hence the early return. This commit fixes the issue by stopping the segment reader before stopping, ensuring only one concurrent caller of stop(). This also adds test simulates a failure where a client disconnect racing with the natural stopping of a segment reader led to a crash. Without the fix, and adding some sleeps in remote_segment_batch_reader::stop(), this would crash fairly consistently, albeit with slightly different backtraces than reported in the issue. Fixes redpanda-data#18213
7 tasks
vbotbuildovich
pushed a commit
to vbotbuildovich/redpanda
that referenced
this issue
May 3, 2024
It's possible that when a client disconnects, the abort source subscription associated with the client connection ends up evicting the current segment reader. 1. Remote partition reader is in the process of reading a segment 2. The read eventually scans past the LSO, and stops. This stop() is asynchonous, closing IO streams, etc. 3. The Kafka client disconnects, triggering the abort source subscription callback to call evict_segment_reader(), moving the segment reader to the remote partition's eviction list 4. The remote partition's eviction loop sees this reader and runs stop(), and destructs the underlying parser 5. The read fiber, still stopping, resumes control, but the underlying parser is destructed This is roughly the sequence of events seen below. ``` DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::raft_configuration DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1398 DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1398-1398], current state: {{base offset/delta: {1397}/1221, map size: 1, last delta: 1221}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1399 DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1399-1400], current state: {{base offset/delta: {1397}/1221, map size: 2, last delta: 1222}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1401 DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1401-1401], current state: {{base offset/delta: {1397}/1221, map size: 3, last delta: 1224}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::tx_fence DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1402 DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1185 - [{{0.0.0.0:0}}] accept_batch_start stop parser because 1403 > 177(kafka offset) DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:435 - No results, current rp offset: 1403, current kafka offset: 178, max rp offset: 177 DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close DEBUG 2024-05-02 14:55:06,234 [shard 2:main] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:247 - abort requested via config.abort_source DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41665 defa47c0/kafka/topic/0_18762/1398-1405-846-134-v1.log.137] - segment_chunk_data_source.cc:48 - chunk data source destroyed DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close ERROR 2024-05-02 14:55:06,235 [shard 2:fetc] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-009787d74859cb582-1/redpanda/redpanda/src/v/storage/segment_reader.cc:210) '_parent == nullptr' Must close before destroying ``` Note, the tx_fence is assigned a Kafka offset 178, but isn't reflected by the LSO-1=177 calculation used as an upper bound for the log reader config, hence the early return. This commit fixes the issue by stopping the segment reader before stopping, ensuring only one concurrent caller of stop(). This also adds test simulates a failure where a client disconnect racing with the natural stopping of a segment reader led to a crash. Without the fix, and adding some sleeps in remote_segment_batch_reader::stop(), this would crash fairly consistently, albeit with slightly different backtraces than reported in the issue. Fixes redpanda-data#18213 (cherry picked from commit 226c43e)
vbotbuildovich
pushed a commit
to vbotbuildovich/redpanda
that referenced
this issue
May 3, 2024
It's possible that when a client disconnects, the abort source subscription associated with the client connection ends up evicting the current segment reader. 1. Remote partition reader is in the process of reading a segment 2. The read eventually scans past the LSO, and stops. This stop() is asynchonous, closing IO streams, etc. 3. The Kafka client disconnects, triggering the abort source subscription callback to call evict_segment_reader(), moving the segment reader to the remote partition's eviction list 4. The remote partition's eviction loop sees this reader and runs stop(), and destructs the underlying parser 5. The read fiber, still stopping, resumes control, but the underlying parser is destructed This is roughly the sequence of events seen below. ``` DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::raft_configuration DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1398 DEBUG 2024-05-02 14:55:06,231 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1398-1398], current state: {{base offset/delta: {1397}/1221, map size: 1, last delta: 1221}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1399 DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1399-1400], current state: {{base offset/delta: {1397}/1221, map size: 2, last delta: 1222}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::archival_metadata DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1401 DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1288 - added offset translation gap [1401-1401], current state: {{base offset/delta: {1397}/1221, map size: 3, last delta: 1224}} DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1197 - [{{0.0.0.0:0}}] accept_batch_start skip because record batch type is batch_type::tx_fence DEBUG 2024-05-02 14:55:06,232 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1271 - [{{0.0.0.0:0}}] skip_batch_start called for 1402 DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149~0 kafka/topic/0] - remote_segment.cc:1185 - [{{0.0.0.0:0}}] accept_batch_start stop parser because 1403 > 177(kafka offset) DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:435 - No results, current rp offset: 1403, current kafka offset: 178, max rp offset: 177 DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close DEBUG 2024-05-02 14:55:06,234 [shard 2:main] cloud_storage - [fiber41664 kafka/topic/0] - remote_partition.cc:247 - abort requested via config.abort_source DEBUG 2024-05-02 14:55:06,234 [shard 2:fetc] cloud_storage - [fiber41665 defa47c0/kafka/topic/0_18762/1398-1405-846-134-v1.log.137] - segment_chunk_data_source.cc:48 - chunk data source destroyed DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1476 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop DEBUG 2024-05-02 14:55:06,235 [shard 2:main] cloud_storage - [fiber848~132~149 kafka/topic/0] - remote_segment.cc:1482 - [{{0.0.0.0:0}}] remote_segment_batch_reader::stop - parser-close ERROR 2024-05-02 14:55:06,235 [shard 2:fetc] assert - Assert failure: (/var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-009787d74859cb582-1/redpanda/redpanda/src/v/storage/segment_reader.cc:210) '_parent == nullptr' Must close before destroying ``` Note, the tx_fence is assigned a Kafka offset 178, but isn't reflected by the LSO-1=177 calculation used as an upper bound for the log reader config, hence the early return. This commit fixes the issue by stopping the segment reader before stopping, ensuring only one concurrent caller of stop(). This also adds test simulates a failure where a client disconnect racing with the natural stopping of a segment reader led to a crash. Without the fix, and adding some sleeps in remote_segment_batch_reader::stop(), this would crash fairly consistently, albeit with slightly different backtraces than reported in the issue. Fixes redpanda-data#18213 (cherry picked from commit 226c43e)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/cloud-storage
Shadow indexing subsystem
kind/bug
Something isn't working
sev/high
loss of availability, pathological performance degradation, recoverable corruption
Version & Environment
Redpanda version: (use
rpk version
): v23.3.12 armWe saw in one cluster several crashes that all appear to have similar root causes.
From these, we can see:
remote_segment_batch_reader
~segment_reader_handle()
rather than inclose()
, further pointing to a lifecycle bugOne suspicious piece of code, from
redpanda/src/v/cloud_storage/remote_segment.cc
Line 1483 in 50b992b
stop()
is not thread-safe. Multiple concurrent calls tostop()
may call_parser->close()
on the same parser and it to null. It's unclear if this is the exact failure path, given reader eviction happens from a single fiber, andset_end_of_stream()
appears to be called at most once indo_load_slice()
.Another suspicious piece of code from
redpanda/src/v/cloud_storage/remote_partition.cc
Lines 248 to 250 in 50b992b
_segment_reader
may be moved. In most places, we check if_segment_reader
is a nullptr before using it, but it seems possible that e.g. hereredpanda/src/v/cloud_storage/remote_partition.cc
Lines 815 to 818 in 50b992b
_segment_reader
.JIRA Link: CORE-2750
The text was updated successfully, but these errors were encountered: