Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SQL statement INSERT INTO with Qbeast format #116

Merged
merged 14 commits into from
Jul 28, 2022
Merged

Support SQL statement INSERT INTO with Qbeast format #116

merged 14 commits into from
Jul 28, 2022

Conversation

Adricu8
Copy link
Contributor

@Adricu8 Adricu8 commented Jul 15, 2022

Description

  • Fixing this issue

  • We have removed the class QbeastBaseRelation. Now we only make use of its object to instantiate a BaseRelation (or HadoopFsRelation)
    In this way, the function createRelation is implemented in object QbeastBaseRelation and the insertable relation is not lost.
    Moreover, we do not need the ReplaceFileIndex extension and has also been removed. LogicalRelation now makes use of a HadoopFsRelation instead of QbeastBaseRelation.

Now we are able to use SQL statement INSERT INTO like:

spark.sql("insert into table t select * from t2")

We had to modify more code to make the following statement work aswell:

spark.sql("insert into table t (value) values (4)")

  • We have to support IdentityTransformation to be merged with a qbeast table when we insert a single value. Right now we only check the case where we merge a new LinearTransformation. We added the logic to LinearTransformation and LinearTransformer files, modifying the necessary objects to make it work.

It would be nice to create a test with all the possible insert statements that spark can handle
https://spark.apache.org/docs/3.1.2/sql-ref-syntax-dml-insert-into.html
Furthermore, we might want to test that the implemented logic is correct.

When we use this syntax we get an error that the transformation is unknown. Instead of a Linear transformation the transformation type is: io.qbeast.core.transform.IdentityTransformation.

Type of change

Describe the change you're making: how it affects the API, user experience...

  • We are fixing a bug, qbeast format should be able to process INSERT INTO sql statements.

Checklist:

  • New feature / bug fix has been committed following the Contribution guide.
  • Add comments to the code (make it easier for the community!).
  • Change the documentation.
  • Add tests.
  • Your branch is updated to the main branch (dependent changes have been merged).

How Has This Been Tested? (Optional)

Please describe the tests that you ran to verify your changes.

  • Test QbeastInsertIntoTest: We run the different INSERT INTO SQL statements and assert that the final table containes the expected DataFrame.
  • Test LinearTransformationTest: I updated this test to assert that the merge function works as intended when merging an IdentityTransformation

Test Configuration:

  • Spark Version: 3.1.1
  • Hadoop Version: 3.2
  • Local environment (M1)

@osopardo1 osopardo1 self-requested a review July 18, 2022 07:02
@osopardo1
Copy link
Member

Hey @Adricu8 , it seems like the description of the PR is not updated, could you explain more about the issue? Thanks!

@Adricu8
Copy link
Contributor Author

Adricu8 commented Jul 18, 2022

Yes of course, I updated the description. Let me know if there is anything unclear.

@Jiaweihu08 Jiaweihu08 added the type: bug Something isn't working label Jul 20, 2022
@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #116 (fecc9f5) into main (0d31627) will increase coverage by 0.05%.
The diff coverage is 97.56%.

@@            Coverage Diff             @@
##             main     #116      +/-   ##
==========================================
+ Coverage   90.64%   90.69%   +0.05%     
==========================================
  Files          60       59       -1     
  Lines        1325     1344      +19     
  Branches      102       99       -3     
==========================================
+ Hits         1201     1219      +18     
- Misses        124      125       +1     
Impacted Files Coverage Δ
...cala/io/qbeast/core/transform/Transformation.scala 50.00% <ø> (ø)
...t/spark/internal/QbeastSparkSessionExtension.scala 100.00% <ø> (ø)
...o/qbeast/core/transform/LinearTransformation.scala 88.73% <94.11%> (+1.63%) ⬆️
...a/io/qbeast/core/transform/LinearTransformer.scala 100.00% <100.00%> (ø)
.../main/scala/io/qbeast/spark/delta/OTreeIndex.scala 88.88% <100.00%> (+1.38%) ⬆️
...la/io/qbeast/spark/internal/rules/SampleRule.scala 89.47% <100.00%> (ø)
...st/spark/internal/sources/QbeastBaseRelation.scala 100.00% <100.00%> (+20.00%) ⬆️
...ain/scala/io/qbeast/spark/table/IndexedTable.scala 90.66% <100.00%> (ø)
...n/scala/io/qbeast/spark/delta/DeltaQbeastLog.scala 0.00% <0.00%> (-100.00%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@Adricu8 Adricu8 marked this pull request as ready for review July 26, 2022 15:32
@osopardo1
Copy link
Member

Looks good to me! Just add simple documentation on the README and that's it!

@Adricu8 Adricu8 merged commit 8b96c68 into Qbeast-io:main Jul 28, 2022
@eavilaes eavilaes linked an issue Aug 3, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include INSERT INTO in Qbeast save contract
3 participants