-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: update_table returns stats #2867
Conversation
@@ -193,7 +209,7 @@ pub struct UpdateJob { | |||
} | |||
|
|||
impl UpdateJob { | |||
pub async fn execute(self) -> Result<Arc<Dataset>> { | |||
pub async fn execute(self) -> Result<(Arc<Dataset>, UpdateFragmentStats)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do a check for public API. whether this is a breaking change.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2867 +/- ##
==========================================
+ Coverage 77.87% 77.89% +0.01%
==========================================
Files 231 231
Lines 70513 70523 +10
Branches 70513 70523 +10
==========================================
+ Hits 54915 54936 +21
- Misses 12465 12469 +4
+ Partials 3133 3118 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! This approach is valid but I wonder if we want to return something other than a tuple. For example, we could change the method to:
struct UpdateResult {
new_dataset: Arc<Dataset>,
rows_updated: u64
}
impl UpdateJob {
pub async fn execute(self) -> Result<UpdateResult> {
...
}
...
This is slightly more readable and, if we decide to return more information in the future we will be able to do so more easily.
// Apply deletions | ||
let removed_row_ids = Arc::into_inner(removed_row_ids) | ||
.unwrap() | ||
.into_inner() | ||
.unwrap(); | ||
let (old_fragments, removed_fragment_ids) = self.apply_deletions(&removed_row_ids).await?; | ||
|
||
let num_updated_rows = new_fragments | ||
.iter() | ||
.map(|f| f.physical_rows.unwrap_or_default() as u64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(|f| f.physical_rows.unwrap_or_default() as u64) | |
.map(|f| f.physical_rows.unwrap() as u64) |
It might be better to just unwrap
here. If physical_rows
is None
then our assumptions our incorrect and it would be better to return an error than a potentially misleading value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are changing the return type of the public method (UpdateJob::execute()
), this should be marked as a breaking-change
. Users should be made aware the return value will be different now.
python/src/dataset.rs
Outdated
@@ -941,7 +941,7 @@ impl Dataset { | |||
.block_on(None, operation.execute())? | |||
.map_err(|err| PyIOError::new_err(err.to_string()))?; | |||
|
|||
self.ds = new_self; | |||
(self.ds, _) = new_self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to also expose this in Python. But if you want to leave that as TODO, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added. please review. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor comment suggestion but looks good otherwise
updates : dict | ||
A dictionary containing the number of rows updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updates : dict | |
A dictionary containing the number of rows updated. | |
statistics : dict | |
A dictionary containing statistics about the update operation |
Minor nit: the word updates
feels as if it is returning the updated values themselves.
BREAKING CHANGE:
UpdateJob::execute()
now returns statistics about number of rows changed.update_table returns