Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: update_table returns stats #2867

Merged
merged 8 commits into from
Sep 13, 2024
Merged

feat!: update_table returns stats #2867

merged 8 commits into from
Sep 13, 2024

Conversation

QianZhu
Copy link
Contributor

@QianZhu QianZhu commented Sep 11, 2024

BREAKING CHANGE: UpdateJob::execute() now returns statistics about number of rows changed.

update_table returns

  • num of rows updated for billing - writes

@github-actions github-actions bot added the enhancement New feature or request label Sep 11, 2024
@QianZhu QianZhu requested a review from eddyxu September 11, 2024 23:55
@@ -193,7 +209,7 @@ pub struct UpdateJob {
}

impl UpdateJob {
pub async fn execute(self) -> Result<Arc<Dataset>> {
pub async fn execute(self) -> Result<(Arc<Dataset>, UpdateFragmentStats)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do a check for public API. whether this is a breaking change.

@wjones127 wjones127 self-requested a review September 12, 2024 00:58
@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.89%. Comparing base (60797a6) to head (8da0515).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write/update.rs 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2867      +/-   ##
==========================================
+ Coverage   77.87%   77.89%   +0.01%     
==========================================
  Files         231      231              
  Lines       70513    70523      +10     
  Branches    70513    70523      +10     
==========================================
+ Hits        54915    54936      +21     
- Misses      12465    12469       +4     
+ Partials     3133     3118      -15     
Flag Coverage Δ
unittests 77.89% <92.85%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! This approach is valid but I wonder if we want to return something other than a tuple. For example, we could change the method to:

struct UpdateResult {
  new_dataset: Arc<Dataset>,
  rows_updated: u64
}

impl UpdateJob {
  pub async fn execute(self) -> Result<UpdateResult> {
    ...
  }
...

This is slightly more readable and, if we decide to return more information in the future we will be able to do so more easily.

// Apply deletions
let removed_row_ids = Arc::into_inner(removed_row_ids)
.unwrap()
.into_inner()
.unwrap();
let (old_fragments, removed_fragment_ids) = self.apply_deletions(&removed_row_ids).await?;

let num_updated_rows = new_fragments
.iter()
.map(|f| f.physical_rows.unwrap_or_default() as u64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(|f| f.physical_rows.unwrap_or_default() as u64)
.map(|f| f.physical_rows.unwrap() as u64)

It might be better to just unwrap here. If physical_rows is None then our assumptions our incorrect and it would be better to return an error than a potentially misleading value.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are changing the return type of the public method (UpdateJob::execute()), this should be marked as a breaking-change. Users should be made aware the return value will be different now.

@@ -941,7 +941,7 @@ impl Dataset {
.block_on(None, operation.execute())?
.map_err(|err| PyIOError::new_err(err.to_string()))?;

self.ds = new_self;
(self.ds, _) = new_self;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also expose this in Python. But if you want to leave that as TODO, that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. please review. Thank you!

@wjones127 wjones127 changed the title feat: update_table returns stats feat!: update_table returns stats Sep 12, 2024
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment suggestion but looks good otherwise

Comment on lines +1077 to +1078
updates : dict
A dictionary containing the number of rows updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
updates : dict
A dictionary containing the number of rows updated.
statistics : dict
A dictionary containing statistics about the update operation

Minor nit: the word updates feels as if it is returning the updated values themselves.

@QianZhu QianZhu merged commit e441ab3 into main Sep 13, 2024
22 of 23 checks passed
@QianZhu QianZhu deleted the update_return_stats branch September 13, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants