-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add app txn visitors and struct #170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this looks great! just a couple of small comments so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at this missing functionality!
The code generally looks good, but I think we can simplify a fair bit?
kernel/src/actions/visitors.rs
Outdated
if let Some(app_id) = getters[0].get_opt(i, "txn.appId")? { | ||
self.transactions.push(Self::visit_txn(i, app_id, getters)?); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, we're paying to collect an unbounded number of app ids in the hash map -- regardless of whether the caller requested a specific one -- and then we return the entire hash map unless it fails to contain the requested match.
This seems expensive in the common case that the caller only cares about one specific app id.
Could we add a "push-down filter" to the visitor? Still have a hash map, but the visitor ignores mismatching app ids if a specific app id was requested so the map size is bounded to 1?
if let Some(app_id) = getters[0].get_opt(i, "txn.appId")? { | |
self.transactions.push(Self::visit_txn(i, app_id, getters)?); | |
} | |
if let Some(app_id) = getters[0].get_opt(i, "txn.appId")? { | |
// If caller requested a specific app id, only visit matches: | |
// | |
// !(is_some && (requested_app_id != app_id)) | |
// ==> !is_some || !(requested_app_id != app_id) | |
// ==> !is_some || (requested_app_id == app_id) | |
if !requested_app_id.is_some_and(|requested| requested != app_id) { | |
val txn = Self::visit_txn(i, app_id, getters)?; | |
if !self.transactions.contains_key(&txn.app_id) { | |
self.transactions.insert(txn.app_id.clone(), txn); | |
} | |
} | |
} |
We could also implement the actual log replay here, so we don't need to do it later and don't need a second visitor for log replay.
(a follow-on optimization, probably not in this PR, could push down the requested app id as a filter to the underlying file reads, so we don't even materialize mismatching app ids in the first place)
Thanks so much. Given your suggestions I removed the irrelevant bits and simplified the logic. From an overall unit testing perspective it would be nice if I could assert which files were read so an assert for the minimum amount of read can be performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Since it anyway needs to rebase and fix clippy errors, suggest also making a pass to add more doc and code comments so people don't have to look at this PR to understand some of the nuances that were discussed.
pub fn new(snapshot: Arc<Snapshot>) -> Self { | ||
TransactionScanner { snapshot } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: @nicklan If I counted correctly, this brings us to six structs whose new
is just a thin wrapper for convenience. Do you think it's worth bringing in the derive-new crate? Tho this reddit thread has some arguments for why trivial new might be an antipattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, it's an interesting point. I think the only reason I'd want to keep it this way is that we could easily change things in the new method and/or add new fields with default values (which we could do using derive-new
) and not require upstream changes.
I created #181 to remind us to think about this
a463e47
to
e8c2a18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thanks for the contribution! lgtm
pub fn new(snapshot: Arc<Snapshot>) -> Self { | ||
TransactionScanner { snapshot } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, it's an interesting point. I think the only reason I'd want to keep it this way is that we could easily change things in the new method and/or add new fields with default values (which we could do using derive-new
) and not require upstream changes.
I created #181 to remind us to think about this
Implements visitors and structures required for reading application transactions that exist on a snapshot.
A similar pattern to the
LogReplayScanner
was implemented where users can explicitly scan metadata for the latest transaction for all application or obtain the latest transaction for particular application.I decided against adding the
TransactionMap
toSnapshot
since snapshot only require protocol + metadata to be built and obtaining the latest transaction may require scanning past that point. Also the overall design ofSnapshot
seems to be immutable and it would not make much sense to have methods that mutate it.