-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-1912] mvlog: add basic segment reader #18245
[CORE-1912] mvlog: add basic segment reader #18245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this LGTM, still groking the details from RFC, but I am liking the simplified interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Just two minor things.
// Parses and collects the record batches from the given entry stream. | ||
// Ignores other kinds of entries. | ||
ss::future<result<collect_stream_outcome, errc>> | ||
collect_batches_from_stream(entry_stream* entries, batch_collector* collector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed a few places where this uses raw pointers instead of references or managed pointers. Is there a reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer raw pointers for in-out and out parameters, a la Google style guide.
As for member pointer variables, unless there's some complicated lifetime relationship between classes, I prefer having a single owner being denoted by a unique_ptr and passing around raw pointers. I'm hoping for the lifetimes of these obejcts to be straightforward enough that it's easily verifiable (e.g. readable_segment is long lived, segment_readers must not outlive the readable_segment, etc), but if that ends up not being the case, I wouldn't be against some kind of smart pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrwng I see google style guide recommending references https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs
non-optional output and input/output parameters should usually be references (which cannot be null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I'll be! Let me fix
src/v/storage/mvlog/entry_stream.cc
Outdated
auto header_buf = co_await read_iobuf_exactly( | ||
stream_, packed_entry_header_size); | ||
if (header_buf.size_bytes() != packed_entry_header_size) { | ||
co_return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this wouldn't be an error? It is an unexpected end of stream. Unless size_bytes() == 0
, which might be an acceptable end of stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this could be an error. Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wouldn't we hard stop / crash the system or not check at all? what i mean is that the contract of read_iobuf_exactly is to guarantee this or throw, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i'm reading correctly, read_iobuf_exactly doesn't have this guarantee. If the stream is too short, it has no choice but to return a shorter buffer.
Maybe adding to the confusion, input_stream::read_up_to and input_stream::read_exactly both exist, but they actually mean "read some amount as long as it's under n" and "read as much as you can up to n". The contract of read_iobuf_exactly appears identical to the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, i wasn't remembering correctly.
ss::future<result<std::optional<entry>, errc>> next(); | ||
|
||
private: | ||
ss::input_stream<char> stream_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing a close() method? It is not clear if it is safe to destroy an ss::input_stream without calling close first.
/// Detaches the \c input_stream from the underlying data source.
///
/// Waits for any background operations (for example, read-ahead) to
/// complete, so that the any resources the stream is using can be
/// safely destroyed. An example is a \ref file resource used by
/// the stream returned by make_file_input_stream().
///
/// \return a future that becomes ready when this stream no longer
/// needs the data source.
future<> close() noexcept {
return _fd.close();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very possible. I'll add a close in subsequent PRs when this gets plugged into more stuff.
// Parses and collects the record batches from the given entry stream. | ||
// Ignores other kinds of entries. | ||
ss::future<result<collect_stream_outcome, errc>> | ||
collect_batches_from_stream(entry_stream* entries, batch_collector* collector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrwng I see google style guide recommending references https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs
non-optional output and input/output parameters should usually be references (which cannot be null)
Adds an wrapper around an input_stream<char> to pull out entries. The expectation here is that the stream will contain a dense collection of entries. This, along with the batch_collector, will be used to build an equivalent to the log_reader.
@nvartolomei hope you don't mind I'd like to merge this and address feedback in a followup change |
Adds a couple utility methods for parsing record batch entries from an entry_stream.
Introduces a couple of abstractions to be used as a part of the upcoming new log_reader implementation. These abstractions are: - readable_segment: A long-lived representation of a segment file, used to construct segment_readers. - segment_reader: A short-lived accessor to the file used to construct input_streams. This doesn't enforce any lifecycle relationships between segment_readers and input_streams, but it is expected that the readers will outlive the streams. These abstractions are pretty bare bones compared to the existing storage::segment, storage::segment_reader, and reader handle. There is some basic accounting done for the readers, but the complexity of a ref-counted handle doesn't seem critical for now.
Adds a test that uses the segment appender and segment readers end-to-end, exercising various batch collecting utilities along the way. This serves as an example of how a future log reader may use these methods to read.
A short read can be indicative of corruption and shouldn't be ignored. Adds a specific error code for this.
a03ac97
to
1ee2e4f
Compare
Nevermind, the changes ended up being not very intrusive. Latest force push:
|
// Parses and collects the record batches from the given entry stream. | ||
// Ignores other kinds of entries. | ||
ss::future<result<collect_stream_outcome, errc>> | ||
collect_batches_from_stream(entry_stream& entries, batch_collector& collector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we pass raw pointers to avoid clang-tidy warnings on reference parameters to coroutines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that reminds me this is actually partially the reason I had pointers in the first place. I opted to change them to abide by GSG, as called out by others, though maybe it's worth an exception for coroutines... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotnwat the benefit would be that the need for explicit &
will make you think twice about the lifetimes? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it can also prevent cases where a temporary would be otherwise destroyed?
i.e. co_await collect_batches_from_stream(entry_stream{input}, c)
? does entry_stream here gets destroyed on suspension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the clang-tidy warning is a good callout. Allowing passing a temporary as a reference is a crash waiting to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone can word a rule nicely for https://github.com/redpanda-data/redpanda/blob/dev/src/v/coding-style.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, pointer is cleaner for coroutine. i mean we could also disable the clang-tidy warning.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48945#018f643f-732c-441f-b84e-e23dbac38d8c |
CI failure: #18329 |
Introduces a few abstractions to be used as a part of the upcoming
new log_reader implementation. These abstractions are:
entries.
construct segment_readers.
input_streams. This doesn't enforce any lifecycle relationships
between segment_readers and input_streams, but it is expected that the
readers will outlive the streams.
These abstractions are pretty bare bones compared to the existing
storage::segment, storage::segment_reader, and reader handle. There is
some basic accounting done for the readers, but the complexity of a
ref-counted handle doesn't seem critical for now.
Some tests are added to exercise appends and reading record batch entries.
Fixes #17255
Backports Required
Release Notes