-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: remove PersistMonotonicWriteWorker #29087
Conversation
There are some test failures to shake out, but feels like this is a worthwhile simplification? |
1000%! Ok if I defer the review on this one to @aljoscha? I'm not too familiar with this code these days. (@ParkMyCar may also be able to take a look? I feel like he was the one to add the monotonic write worker originally.) |
@@ -1153,6 +1049,55 @@ where | |||
(tx, handle.abort_on_drop(), shutdown_tx) | |||
} | |||
|
|||
async fn monotonic_append<T: Timestamp + Lattice + Codec64 + TimestampManipulation>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I ported the intention here correctly, it was kinda hard to follow
// aware of read-only mode and will not attempt to write before told | ||
// to do so. | ||
// | ||
self.register_introspection_collection(id, *typ, write) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to pull this call up here? It's hard to tell what in the method is a delicate dance of doing things in just the right order and what's historical cruft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be alright. register_introspection_collection
itself does a dance around first doing the "truncation" and then registering with the worker, but that's described in comments in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the Monotonic writer perspective, not sure about the 0dt and read-only bits though
self.collections.insert(id, collection_state); | ||
new_source_statistic_entries.insert(id); | ||
// This collection of statistics is periodically aggregated into | ||
// `source_statistics`. | ||
new_webhook_statistic_entries.insert(id); | ||
// Register the collection so our manager knows about it. | ||
// | ||
// NOTE: Maybe this shouldn't be in the collection manager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought the same thing in the past, and the current situation is very much for historic reasons, IMO.
edit: because I wrote this comment ... 😂 (only realized this when scrolling down)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
src/storage-controller/src/lib.rs
Outdated
.await | ||
.expect("sender hung up")?; | ||
.unwrap(); | ||
let write_handle = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it maybe a bit heavy handed to open a write handle just for this? On the other hand, we only use this method on rare occasions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write handles are cheap as long as you never run a CaA with them! That's subtle, so probably worth a comment. I'll stick one in when I circle back to this.
// aware of read-only mode and will not attempt to write before told | ||
// to do so. | ||
// | ||
self.register_introspection_collection(id, *typ, write) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be alright. register_introspection_collection
itself does a dance around first doing the "truncation" and then registering with the worker, but that's described in comments in there.
let res = write_handle | ||
.compare_and_append( | ||
updates, | ||
Antichain::from_elem(lower), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where it differs from the original impl, which uses current_upper
as the expected upper and only uses lower
to determine a timestamp for the timestampless updates.
As is here, I would expect this to fail when at_least
is higher than current_upper
, because the compare_and_append
would then fail because the expected upper doesn't line up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I'll have to load this up again to feel like I totally understand that, but sounds like a pretty mechanical fix.
Something I don't understand is that previously, the MonotonicAppend
call could return a StorageError
that the upper didn't match. But why? Isn't the contract just that the thing gets written down at some timestamp that's at least the at_least
? Do you know if this was just an artifact of implementation (assigning a timestamp and then afterward treating it like a normal Append) or is that necessary for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention was "we pick a timestamp that's guaranteed to increase, but we still do a compare_and_append to ensure that there's no other writer". I also believe that the reasoning around these "monotonic appends" is a bit hand-wave-y, especially when you consider potential concurrent writers, in a use-case isolation future, and just in general. It comes from a world where we didn't have the differential collections and from a world where we didn't think about 0dt/uci and relied a lot on "yeah yeah, envd
fencing makes this all correct" (which I never fully trusted... 😅 ).
So it isn't what I would call a blind write, where the semantics really are: write down the thing, no matter what, if there's a concurrent write and the upper doesn't match you can retry.
It's part of the reason I introduced the differential collections, which have a sync-and-retry loop and generally feel much more robust. And now that the "partially truncate" logic also used a proper compare_and_append (I changed that in 1), I feel better about our consistency story. And it might be that all users of monotonic_append
now are okay with them being blind writes.
src/storage-controller/src/lib.rs
Outdated
@@ -869,9 +893,6 @@ where | |||
} | |||
} | |||
|
|||
self.append_shard_mappings(new_collections.into_iter(), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, this call was a no-op before because it would hit this short-circuit:
let id = match self
.introspection_ids
.lock()
.expect("poisoned")
.get(&IntrospectionType::ShardMapping)
{
Some(id) => *id,
_ => return,
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, this was incorrect. It was a no-op for the first call, but not on the later ones
Okay @aljoscha this seems to be passing tests now. Modulo a lint failure and the comment I promised, this should be ready for a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is excellent! 🙌
I had one inline concern about initialize_shard_mapping()
.
let res = write_handle | ||
.compare_and_append( | ||
updates, | ||
Antichain::from_elem(lower), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention was "we pick a timestamp that's guaranteed to increase, but we still do a compare_and_append to ensure that there's no other writer". I also believe that the reasoning around these "monotonic appends" is a bit hand-wave-y, especially when you consider potential concurrent writers, in a use-case isolation future, and just in general. It comes from a world where we didn't have the differential collections and from a world where we didn't think about 0dt/uci and relied a lot on "yeah yeah, envd
fencing makes this all correct" (which I never fully trusted... 😅 ).
So it isn't what I would call a blind write, where the semantics really are: write down the thing, no matter what, if there's a concurrent write and the upper doesn't match you can retry.
It's part of the reason I introduced the differential collections, which have a sync-and-retry loop and generally feel much more robust. And now that the "partially truncate" logic also used a proper compare_and_append (I changed that in 1), I feel better about our consistency story. And it might be that all users of monotonic_append
now are okay with them being blind writes.
) -> Result<(), StorageError<T>> { | ||
tracing::info!(%id, ?introspection_type, "preparing introspection collection for writes"); | ||
|
||
match introspection_type { | ||
IntrospectionType::ShardMapping => { | ||
self.initialize_shard_mapping().await; | ||
// Done by the `self.append_shard_mappings` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still need this? The append_shard_mappings
call in create_collections
only does it for new collections, and initialize_shard_mapping()
made sure we write down the collections that were already known at the time at which we "create" the shard-mappings shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure how we want this to work. I removed this because they were getting double registered. It seems that before, during initial boot we were hitting the following short-circuit when append_shard_mappings
got called as part of create_collections_for_bootstrap
. But this PR shifts the order slightly of things in create_collections_for_bootstrap
and ends up putting ShardMapping
in the introspection_ids
map before append_shard_mappings
gets called. This resulted in them getting added twice, once from append_shard_mappings
and once from initialize_shard_mapping
let id = match self
.introspection_ids
.lock()
.expect("poisoned")
.get(&IntrospectionType::ShardMapping)
{
Some(id) => *id,
_ => return,
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We hashed this out in a huddle. Decision was to make this short-circuit a panic if that's possible, or add a comment if not.)
@aljoscha Nightlies seem happy with replacing the append_shard_mapping short-circuit with a panic. Turns out it was already documented to do that, anyway. Wanna stamp? |
Sorry, should have stamped yesterday after we talked! 🙈 |
It was originally introduced to wrap persist WriteHandles in a task, but everything that was using it to write data was already in a task. So at this point, it's just an extra layer of indirection to reason about.
\o/ TFTR! |
It was originally introduced to wrap persist WriteHandles in a task, but
everything that was using it to write data was already in a task. So at
this point, it's just an extra layer of indirection to reason about.
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.