-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC #25211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…E2E test case added in MssQLServerIntegrationSuite
…ourcev2.
- df.write.format("jdbcv2").mode("append") appends to Table if table exists. create table not supported as of now.
- validation with SQLServer 2017 only.
- Good level of logging to help understand flows
- E2E test cases added in MsSqlServerIntegrationSuite.scala
…E2E test case added in MssQLServerIntegrationSuite
…le and only append if a table exist. If table does not exist, a dbtable::schema request would return a null schema and fw will raise an exception.
…E2E test case added in MssQLServerIntegrationSuite
…E2E test case added in MssQLServerIntegrationSuite
…le and only append if a table exist. If table does not exist, a dbtable::schema request would return a null schema and fw will raise an exception.
|
MVP read and write path is in place now. I have a few issues/questions that i will add to org/apache/spark/sql/execution/datasources/v2/jdbc/Readme.md and start some discussion on the mailing list. @rdblue @cloud-fan @gengliangwang @brkyvz . Not ready for complete review, but a directional review will help greatly ( is this on the right track?). |
…ionSuite.scala Scemantics are TRUNCATE TABLE and then overwrite with new data. Existing table schema is preserved. Overwrite(w/o truncate) - Scaffolding in place. Utils::CreateTable is dummy. Still need to be implemented. Scemantics are DROP TABLE, CREATE TABLE with new passed schema and then overwrite with new data. Problems - FW keep calling WriteBuilder::truncate() even when the truncate option is not specified or truncate explicitly set to false. Test update with truncate=false. - Added test df.filter and then overwrite(w/o) truncate to only write set of rows that match filter. FW still calls truncate Read path fixed to return schema with pruned columns as suggsted in Scan::readSchema Select with pruned columns still does not work.
…and CREATE TABLE. Reuses JDBCUtils to DROP and CREATE. JDBCUtils had to be refactored to take schema rather than dataframe. Functions that Dataframe are retained V1 compatibility. The V2 implementtion is not e2e tested as FW continues to send truncate rather than overwrite. V1 Regression test following JDBCUtils change UnitTest (./build/mvn -pl :spark-sql_2.12 clean install) were run. Test passed with regular failures that are see on master branch also. Total number of tests run: 5896 Suites: completed 288, aborted 0 Tests: succeeded 5893, failed 3, canceled 1, ignored 45, pending 0 V1 Integration Test (./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12) were run and all passed Run completed in 36 seconds, 352 milliseconds. Total number of tests run: 22 Suites: completed 5, aborted 1 Tests: succeeded 22, failed 0, canceled 0, ignored 6, pending 0
|
Have a first draft of DataSourceV2 based JDBC connector available now ( PR#25211) . The goal was a MVP implementation with support for batch read/write. I am looking forward for your review comments to help guide direction. Note that i am still understanding/addressing some issues. The plan, status issues is capture in the Readme.md Summary of changes
|
|
Can one of the admins verify this patch? |
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
|
@shivsood any progress about this PR ? |
What changes were proposed in this pull request?
This is a Work in Progress PR for DataSourceV2 based connector for JDBC. The goal is a MVP for both read and write path based on latest data source V2 apis. As of now the PR is not complete, but provided here for visibility on this work and comments to set us in the right direction.
Another PR on related work is #21861. That uses older V2 apis, but some of the work there may be still relevant. Have requested author to consider merge if possible.
@tengpeng @xianyin as FYI as they volunteered for contribution to the work going forward.
Readme.md added for high level work items. Find it at org/apache/spark/sql/execution/datasources/v2/jdbc/Readme.md
(Please fill in changes proposed in this fix)
The current PR implements the following ( will keep this updated we make progress on this)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
The path was mainly integration tested for write ( append) path.
Please review https://spark.apache.org/contributing.html before opening a pull request.