Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: change the default data storage version to "stable" (e.g. v2.0) #2829

Merged
merged 9 commits into from
Sep 11, 2024

Conversation

westonpace
Copy link
Contributor

@westonpace westonpace commented Sep 4, 2024

Closes #2394

This PR changes a few remaining tests. Also, by changing the default to v2 we exposed a few minor inconsistencies with v1 that we fixed.

  • When creating a fragment we reported progress before adding the filename to the fragment. We now add the filename to the fragment before reporting progress.
  • Nested projection was broken (existing nested projection tests passed by luck). This required some slight change to how we calculate projection.

BREAKING CHANGE: new datasets will no longer be readable by versions older than 0.16

@github-actions github-actions bot added enhancement New feature or request python labels Sep 4, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 91.82058% with 31 lines in your changes missing coverage. Please review.

Project coverage is 78.11%. Comparing base (f4e3300) to head (088e082).

Files with missing lines Patch % Lines
rust/lance-encoding/src/decoder.rs 26.66% 5 Missing and 6 partials ⚠️
rust/lance-file/src/v2/reader.rs 90.90% 2 Missing and 5 partials ⚠️
rust/lance-datagen/src/generator.rs 89.18% 0 Missing and 4 partials ⚠️
rust/lance/src/dataset/fragment.rs 91.42% 0 Missing and 3 partials ⚠️
rust/lance/src/dataset/transaction.rs 72.72% 3 Missing ⚠️
java/core/lance-jni/src/utils.rs 0.00% 1 Missing ⚠️
rust/lance-encoding-datafusion/src/zone.rs 0.00% 1 Missing ⚠️
rust/lance-index/src/scalar/lance_format.rs 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2829      +/-   ##
==========================================
+ Coverage   78.04%   78.11%   +0.06%     
==========================================
  Files         229      229              
  Lines       70357    70492     +135     
  Branches    70357    70492     +135     
==========================================
+ Hits        54911    55064     +153     
+ Misses      12361    12315      -46     
- Partials     3085     3113      +28     
Flag Coverage Δ
unittests 78.11% <91.82%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably state the earliest version of Lance that can read these new V2 files. I think we are breaking a certain backwards compatibility by changing the default, right?

@westonpace
Copy link
Contributor Author

I think we are breaking a certain backwards compatibility by changing the default, right?

Backwards compatibility, no, I don't think so. We won't lose any capability to read older files.

Forwards compatibility: kind of.

This only applies the v2 default to new datasets. So old readers will continue to be able to read old datasets but yes, readers beyond a certain point (probably 0.16 to be safe but in reality maybe even older) will not be able to read new datasets created by new versions. I'll update some docs somewhere.

@westonpace westonpace changed the title feat: change the default data storage version to "stable" (e.g. v2.0) feat!: change the default data storage version to "stable" (e.g. v2.0) Sep 9, 2024
@westonpace
Copy link
Contributor Author

We should probably state the earliest version of Lance that can read these new V2 files. I think we are breaking a certain backwards compatibility by changing the default, right?

I've marked this a breaking change. I verified that we have a "minimum lance version" in the format section (and updated it from 0.15 to 0.16). I've added some basic "future compatibility" tests to make sure we can, in fact, read lance files written with the latest version in 0.16.

Comment on lines +115 to +118
- name: Run forward compatibility tests
run: |
source venv/bin/activate
pytest python/tests/forward_compat --run-forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fancy! ✨


dataset = lance.dataset(base_dir)

print(format_fragment(dataset.get_fragment(0).metadata, dataset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be keeping this print?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Removed :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we support pushdown in V2 yet? Is that coming soon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Soon? I'll try and work on that next once I'm done with this mini-block stuff.

@github-actions github-actions bot added the java label Sep 11, 2024
@westonpace westonpace merged commit c0e1f15 into lancedb:main Sep 11, 2024
21 of 23 checks passed
wjones127 added a commit that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change the writer default to write v2 files
3 participants