-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert to Qbeast #152
Convert to Qbeast #152
Conversation
…d from conversion
…format of the table
…rr schema for partitioned parquet files
Codecov Report
@@ Coverage Diff @@
## main #152 +/- ##
==========================================
+ Coverage 93.18% 93.56% +0.37%
==========================================
Files 76 80 +4
Lines 1775 1879 +104
Branches 133 146 +13
==========================================
+ Hits 1654 1758 +104
Misses 121 121
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
src/main/scala/io/qbeast/spark/delta/QbeastMetadataOperation.scala
Outdated
Show resolved
Hide resolved
Hello! I did the first review, feel free to discuss any of my comments in the thread. One more suggestion: I think it's better to call it Convert To Qbeast (Or incremental conversion to qbeast). Compatibility might be a confusing word (since we are already compatible) and the scope of this PR is actually being able to convert the table. The hybrid support is an inevitable change for the final result. |
Description
This PR adds the ability to read a hybrid
qbeast + delta
table using qbeast.It also introduces the
ConverToQbeastCommand
to qbeast-spark to allow reading aparquet
ordelta
table without indexing and rewriting. Partitioned table are not supported by this operation.These features are achieved by putting the non-qbeast
AddFile
s in a staging Revision, which is created during the first qbeast write (including overwrites), or when running the conversion command. The non-qbeast files are at the moment characterized as having null tags, and are all placed in the root of the staging Revision.During reads, these files are only processed in memory(for now), no filtering at the file level is done for all files are in the root.
The converted table can be read using either
delta
orqbeast
, and appending to the converted table usingdelta
puts the data into the staging Revision without indexing. Appends that useqbeast
follow the usual qbeast indexing procedure.Compaction
can be executed on the staging revision to reduce its number of files.Fixes #102, #121, #149
This PR also makes the following changes:
RemoveFile
inDeltaMetadataWriter
Type of change
EmptyTransformer
s andEmptyTransformation
s, and most importantlyRevisionID = 0
QbeastFormat
Checklist:
How Has This Been Tested? (Optional)
Test Configuration:
3.2.2
3.3.1
Local