Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Provide read access to commit/message log and odb #265

Merged
merged 7 commits into from
Oct 3, 2023

Conversation

kim
Copy link
Contributor

@kim kim commented Sep 1, 2023

Description of Changes

This is a first approximation to provide data access for Kafka-style replication. It punts on notifications of segment rotation (which could also be solved via filesystem events).

ObjectDB access is fairly crude, and will likely require it's own replication subsystem.

Note that this is against cloud-next.

API and ABI

  • This is a breaking change to the module ABI
  • This is a breaking change to the module API
  • This is a breaking change to the ClientAPI
  • This is a breaking change to the SDK API

If the API is breaking, please state below what will break

This is a first approximation to provide data access for Kafka-style
replication. It punts on notifications of segment rotation (which
could also be solved via filesystem events).

ObjectDB access is fairly crude, and will likely require it's own
replication subsystem.
@kim kim changed the base branch from kim/cloud-next to master September 5, 2023 06:25
@kim
Copy link
Contributor Author

kim commented Sep 5, 2023

This will probably do, so rebased into master

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More or less seems okay, although I have some important questions inline.

/// The iterator represents a _snapshot_ of the log at the time this method
/// is called. That is, segments created after the method returns will not
/// appear in the iteration. The last segment yielded by the iterator may be
/// incomplete (i.e. still be appended to).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These doc comments are A+ 🙏


/// A read-only view of a [`CommitLog`].
pub struct CommitLogView {
mlog: Option<Arc<Mutex<MessageLog>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest implementing the CommitLog by giving it a CommitLogView internally, since it does already share the same fields. Hmm that doesn't seem like the right solution either, but I wonder if we can have a clearer abstraction here.

It seems like we are almost trying to create the concept of a "File" (i.e. CommitLogView) and a "BufferedFile" (i.e. CommitLog) where the buffered thing is unwritten commit which is only periodically flushed. I wonder if something like that might fit better down the road.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of CommitLogView is for RelationalDB to hand out a view over its internal state in order to serve up log segments. It must hide append, but changing the visibility of append isn't helpful.

CommitLogView could be thought of as a newtype around CommitLog, but internally that'd be copying more data than strictly required. Perhaps if there was an Arc around the whole CommitLog, but I'm always wary about contention.

/// The underlying file is opened lazily when calling [`SegmentView::try_into_iter`]
/// or [`SegmentView::try_into_file`].
#[derive(Clone, Debug)]
pub struct SegmentView {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required instead of MessageLogIter? Also presumably we should implement the internals of MessageLogIter with SegementView because we're duplicating some code that really needs to stay in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask about MessageLogIter, and perhaps submit a separate patch: the issue is, it has very different semantics. For one, it panics on I/O errors, which isn't great when it's not a matter of database consistency. More importantly, though, segments_for_offset returns the first segment if offset does not fall into the last segment -- the significance of which I don't understand.

So imho this would require to write a number of tests at least, which I wouldn't want to conflate with this patch (which does not change any database internals).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required instead of MessageLogIter?

It is also not helpful to borrow MessageLog when that value is behind a mutex.

Copy link
Contributor Author

@kim kim Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and also note that it is useful to yield whole segments (in addition to a stream of messages) to allow consumers which are very far behind to fetch all of them in parallel.

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ultimately, although I'd like to think about a different philosophy on this

@cloutiertyler cloutiertyler enabled auto-merge (squash) October 3, 2023 22:00
@cloutiertyler cloutiertyler merged commit e8aed85 into master Oct 3, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants