Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add app txn visitors and struct #170

Merged
merged 1 commit into from
Apr 29, 2024
Merged

add app txn visitors and struct #170

merged 1 commit into from
Apr 29, 2024

Conversation

Blajda
Copy link
Contributor

@Blajda Blajda commented Apr 9, 2024

Implements visitors and structures required for reading application transactions that exist on a snapshot.

A similar pattern to the LogReplayScanner was implemented where users can explicitly scan metadata for the latest transaction for all application or obtain the latest transaction for particular application.

I decided against adding the TransactionMap to Snapshot since snapshot only require protocol + metadata to be built and obtaining the latest transaction may require scanning past that point. Also the overall design of Snapshot seems to be immutable and it would not make much sense to have methods that mutate it.

@Blajda Blajda marked this pull request as draft April 9, 2024 02:04
@Blajda Blajda changed the title :WIP: add app txn visitors and struct add app txn visitors and struct Apr 18, 2024
@Blajda Blajda marked this pull request as ready for review April 18, 2024 02:00
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this looks great! just a couple of small comments so far.

kernel/src/transaction.rs Outdated Show resolved Hide resolved
kernel/src/transaction.rs Outdated Show resolved Hide resolved
kernel/src/transaction.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this missing functionality!
The code generally looks good, but I think we can simplify a fair bit?

Comment on lines 297 to 329
if let Some(app_id) = getters[0].get_opt(i, "txn.appId")? {
self.transactions.push(Self::visit_txn(i, app_id, getters)?);
}
Copy link
Collaborator

@scovich scovich Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, we're paying to collect an unbounded number of app ids in the hash map -- regardless of whether the caller requested a specific one -- and then we return the entire hash map unless it fails to contain the requested match.

This seems expensive in the common case that the caller only cares about one specific app id.

Could we add a "push-down filter" to the visitor? Still have a hash map, but the visitor ignores mismatching app ids if a specific app id was requested so the map size is bounded to 1?

Suggested change
if let Some(app_id) = getters[0].get_opt(i, "txn.appId")? {
self.transactions.push(Self::visit_txn(i, app_id, getters)?);
}
if let Some(app_id) = getters[0].get_opt(i, "txn.appId")? {
// If caller requested a specific app id, only visit matches:
//
// !(is_some && (requested_app_id != app_id))
// ==> !is_some || !(requested_app_id != app_id)
// ==> !is_some || (requested_app_id == app_id)
if !requested_app_id.is_some_and(|requested| requested != app_id) {
val txn = Self::visit_txn(i, app_id, getters)?;
if !self.transactions.contains_key(&txn.app_id) {
self.transactions.insert(txn.app_id.clone(), txn);
}
}
}

We could also implement the actual log replay here, so we don't need to do it later and don't need a second visitor for log replay.

(a follow-on optimization, probably not in this PR, could push down the requested app id as a filter to the underlying file reads, so we don't even materialize mismatching app ids in the first place)

kernel/src/transaction.rs Outdated Show resolved Hide resolved
kernel/src/transaction.rs Outdated Show resolved Hide resolved
@Blajda
Copy link
Contributor Author

Blajda commented Apr 27, 2024

Thanks for taking a stab at this missing functionality! The code generally looks good, but I think we can simplify a fair bit?

Thanks so much. Given your suggestions I removed the irrelevant bits and simplified the logic. From an overall unit testing perspective it would be nice if I could assert which files were read so an assert for the minimum amount of read can be performed.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Since it anyway needs to rebase and fix clippy errors, suggest also making a pass to add more doc and code comments so people don't have to look at this PR to understand some of the nuances that were discussed.

kernel/src/transaction.rs Show resolved Hide resolved
kernel/src/actions/visitors.rs Show resolved Hide resolved
kernel/src/actions/visitors.rs Outdated Show resolved Hide resolved
Comment on lines +15 to +17
pub fn new(snapshot: Arc<Snapshot>) -> Self {
TransactionScanner { snapshot }
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: @nicklan If I counted correctly, this brings us to six structs whose new is just a thin wrapper for convenience. Do you think it's worth bringing in the derive-new crate? Tho this reddit thread has some arguments for why trivial new might be an antipattern?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, it's an interesting point. I think the only reason I'd want to keep it this way is that we could easily change things in the new method and/or add new fields with default values (which we could do using derive-new) and not require upstream changes.

I created #181 to remind us to think about this

kernel/src/transaction.rs Outdated Show resolved Hide resolved
@Blajda Blajda force-pushed the app-txn branch 2 times, most recently from a463e47 to e8c2a18 Compare April 27, 2024 20:26
kernel/src/transaction.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks for the contribution! lgtm

Comment on lines +15 to +17
pub fn new(snapshot: Arc<Snapshot>) -> Self {
TransactionScanner { snapshot }
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, it's an interesting point. I think the only reason I'd want to keep it this way is that we could easily change things in the new method and/or add new fields with default values (which we could do using derive-new) and not require upstream changes.

I created #181 to remind us to think about this

@nicklan nicklan merged commit b0f94f0 into delta-io:main Apr 29, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants