feat!: update_table returns stats #2867

QianZhu · 2024-09-11T23:54:53Z

BREAKING CHANGE: UpdateJob::execute() now returns statistics about number of rows changed.

update_table returns

num of rows updated for billing - writes

eddyxu · 2024-09-11T23:57:09Z

rust/lance/src/dataset/write/update.rs

@@ -193,7 +209,7 @@ pub struct UpdateJob {
 }

 impl UpdateJob {
-    pub async fn execute(self) -> Result<Arc<Dataset>> {
+    pub async fn execute(self) -> Result<(Arc<Dataset>, UpdateFragmentStats)> {


Could we do a check for public API. whether this is a breaking change.

codecov-commenter · 2024-09-12T02:13:40Z

Codecov Report

Attention: Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.89%. Comparing base (60797a6) to head (8da0515).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/write/update.rs	91.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2867      +/-   ##
==========================================
+ Coverage   77.87%   77.89%   +0.01%     
==========================================
  Files         231      231              
  Lines       70513    70523      +10     
  Branches    70513    70523      +10     
==========================================
+ Hits        54915    54936      +21     
- Misses      12465    12469       +4     
+ Partials     3133     3118      -15

Flag	Coverage Δ
unittests	`77.89% <92.85%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westonpace

Thanks for working on this! This approach is valid but I wonder if we want to return something other than a tuple. For example, we could change the method to:

struct UpdateResult {
  new_dataset: Arc<Dataset>,
  rows_updated: u64
}

impl UpdateJob {
  pub async fn execute(self) -> Result<UpdateResult> {
    ...
  }
...

This is slightly more readable and, if we decide to return more information in the future we will be able to do so more easily.

westonpace · 2024-09-12T14:18:02Z

rust/lance/src/dataset/write/update.rs

        // Apply deletions
        let removed_row_ids = Arc::into_inner(removed_row_ids)
            .unwrap()
            .into_inner()
            .unwrap();
        let (old_fragments, removed_fragment_ids) = self.apply_deletions(&removed_row_ids).await?;

+        let num_updated_rows = new_fragments
+            .iter()
+            .map(|f| f.physical_rows.unwrap_or_default() as u64)


Suggested change

.map(|f| f.physical_rows.unwrap_or_default() as u64)

.map(|f| f.physical_rows.unwrap() as u64)

It might be better to just unwrap here. If physical_rows is None then our assumptions our incorrect and it would be better to return an error than a potentially misleading value.

wjones127

Since you are changing the return type of the public method (UpdateJob::execute()), this should be marked as a breaking-change. Users should be made aware the return value will be different now.

wjones127 · 2024-09-12T15:23:53Z

python/src/dataset.rs

@@ -941,7 +941,7 @@ impl Dataset {
            .block_on(None, operation.execute())?
            .map_err(|err| PyIOError::new_err(err.to_string()))?;

-        self.ds = new_self;
+        (self.ds, _) = new_self;


It would be nice to also expose this in Python. But if you want to leave that as TODO, that's fine.

added. please review. Thank you!

westonpace

A minor comment suggestion but looks good otherwise

westonpace · 2024-09-13T12:54:28Z

python/python/lance/dataset.py

+        updates : dict
+            A dictionary containing the number of rows updated.


Suggested change

updates : dict

A dictionary containing the number of rows updated.

statistics : dict

A dictionary containing statistics about the update operation

Minor nit: the word updates feels as if it is returning the updated values themselves.

update_table returns stats

2724ee6

github-actions bot added the enhancement New feature or request label Sep 11, 2024

QianZhu requested a review from eddyxu September 11, 2024 23:55

eddyxu reviewed Sep 11, 2024

View reviewed changes

no breaking change

af3b9eb

github-actions bot added the python label Sep 12, 2024

wjones127 self-requested a review September 12, 2024 00:58

fix issues

5b72393

QianZhu requested a review from westonpace September 12, 2024 02:06

westonpace reviewed Sep 12, 2024

View reviewed changes

wjones127 reviewed Sep 12, 2024

View reviewed changes

wjones127 changed the title ~~feat: update_table returns stats~~ feat!: update_table returns stats Sep 12, 2024

github-actions bot added the breaking-change label Sep 12, 2024

QianZhu added 4 commits September 12, 2024 16:54

address comments

441fc73

Merge branch 'main' into update_return_stats

d41babe

fix update doc

9aaef40

lint

d454fcd

westonpace approved these changes Sep 13, 2024

View reviewed changes

Merge branch 'main' into update_return_stats

8da0515

QianZhu merged commit e441ab3 into main Sep 13, 2024
22 of 23 checks passed

QianZhu deleted the update_return_stats branch September 13, 2024 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: update_table returns stats #2867

feat!: update_table returns stats #2867

QianZhu commented Sep 11, 2024 •

edited by wjones127

Loading

eddyxu Sep 11, 2024

codecov-commenter commented Sep 12, 2024 •

edited

Loading

westonpace left a comment •

edited

Loading

westonpace Sep 12, 2024

wjones127 left a comment

wjones127 Sep 12, 2024

QianZhu Sep 12, 2024

westonpace left a comment

westonpace Sep 13, 2024

	.map(\|f\| f.physical_rows.unwrap_or_default() as u64)
	.map(\|f\| f.physical_rows.unwrap() as u64)

		updates : dict
		A dictionary containing the number of rows updated.

feat!: update_table returns stats #2867

feat!: update_table returns stats #2867

Conversation

QianZhu commented Sep 11, 2024 • edited by wjones127 Loading

eddyxu Sep 11, 2024

Choose a reason for hiding this comment

codecov-commenter commented Sep 12, 2024 • edited Loading

Codecov Report

westonpace left a comment • edited Loading

Choose a reason for hiding this comment

westonpace Sep 12, 2024

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

wjones127 Sep 12, 2024

Choose a reason for hiding this comment

QianZhu Sep 12, 2024

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

westonpace Sep 13, 2024

Choose a reason for hiding this comment

QianZhu commented Sep 11, 2024 •

edited by wjones127

Loading

codecov-commenter commented Sep 12, 2024 •

edited

Loading

westonpace left a comment •

edited

Loading