-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: fix windowed compaction error handling #16928
Conversation
The staging file was missing a '.'. Note that these files are transient, so there isn't a compactibility concern here.
An upcoming test will inject an exception to simulate a failure case seen in the wild that is otherwise hard to reproduce. I'd considered using the file_sanitizer-based error injection, but it seems that infra only really shines at making Redpanda hit asserts, so it's not quite what I'm looking for (my upcoming test will assert that we _don't_ crash in a particular error scenario).
This will be useful to test methods that need a feature table (e.g. deduplicate_segments() has a feature table as an argument).
We were previously using the l-value-variant of model::record_batch_reader::consume(), which doesn't have a built-in call to finally(). For a log_reader, this meant that in some scenarios (e.g. if the reducer were to fail) we could end up not calling finally(), and therefore wouldn't clear the underlying segment reader. This is exactly what happened, as we hit: std::runtime_error (lz4f_compressend:ERROR_dstMaxSize_tooSmall) errors as we compressed/decompressed batches, and then ultimately crashed because the log_reader was not closed: Assert failure: (.../redpanda/redpanda/src/v/storage/log_reader.h:141) '!_iterator.reader' log reader destroyed with live reader"
@@ -227,7 +227,7 @@ ss::future<index_state> deduplicate_segment( | |||
&cmp_idx_writer, | |||
inject_reader_failure); | |||
|
|||
auto new_idx = co_await rdr.consume( | |||
auto new_idx = co_await std::move(rdr).consume( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. i just lost my foot looking at this.
I think this if fine, but i would leave a ticket to fix record_batch_reader::impl::consume, by chaining inside consume()& an then_wrapped() to clean up in case of exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have append_entries_request::serde_async_write that does a consume like this, maybe it warrants a similar treatment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or at least a comment around this in reader::consume about the exception safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wasn't sure what the rationale was but I assumed it was intentional that one doesn't call the other. At least I see that some other call-sites of the l-value variant explicitly include a finally()
call
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
We were previously using the l-value-variant of
model::record_batch_reader::consume(), which doesn't have a built-in
call to finally(). For a log_reader, this meant that in some scenarios
(e.g. if the reducer were to fail) we could end up not calling
finally(), and therefore wouldn't clear the underlying segment reader.
This is exactly what happened, as we hit:
std::runtime_error (lz4f_compressend:ERROR_dstMaxSize_tooSmall)
errors as we compressed/decompressed batches, and then ultimately
crashed because the log_reader was not closed:
Assert failure: (.../redpanda/redpanda/src/v/storage/log_reader.h:141) '!_iterator.reader' log reader destroyed with live reader"
Backports Required
Release Notes
Bug Fixes