[WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC #25211

shivsood · 2019-07-20T16:36:48Z

What changes were proposed in this pull request?

This is a Work in Progress PR for DataSourceV2 based connector for JDBC. The goal is a MVP for both read and write path based on latest data source V2 apis. As of now the PR is not complete, but provided here for visibility on this work and comments to set us in the right direction.

Another PR on related work is #21861. That uses older V2 apis, but some of the work there may be still relevant. Have requested author to consider merge if possible.
@tengpeng @xianyin as FYI as they volunteered for contribution to the work going forward.

Readme.md added for high level work items. Find it at org/apache/spark/sql/execution/datasources/v2/jdbc/Readme.md

(Please fill in changes proposed in this fix)
The current PR implements the following ( will keep this updated we make progress on this)

Scaffolding for read/write paths.
First draft implementation of dataframe write(append) flow. Connector name is "jdbcv2". df.write.format("jdbcv2").mode("append") appends to Table if table exists. Create table not supported as of now.
E2E test cases added in MsSqlServerIntegrationSuite.scala
JDBCUtils is reused as and when easiliy possible, but further scope of refactoring to make it work for both V1 and V2 flows.

How was this patch tested?

Validation with SQLServer 2017 only.
No unit test cases added for now.

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
The path was mainly integration tested for write ( append) path.

Please review https://spark.apache.org/contributing.html before opening a pull request.

…E2E test case added in MssQLServerIntegrationSuite

…ourcev2. - df.write.format("jdbcv2").mode("append") appends to Table if table exists. create table not supported as of now. - validation with SQLServer 2017 only. - Good level of logging to help understand flows - E2E test cases added in MsSqlServerIntegrationSuite.scala

…v2_jdbc

…E2E test case added in MssQLServerIntegrationSuite

…v2_jdbc

…y for now

…le and only append if a table exist. If table does not exist, a dbtable::schema request would return a null schema and fw will raise an exception.

…lter as yet

…E2E test case added in MssQLServerIntegrationSuite

…y for now

…le and only append if a table exist. If table does not exist, a dbtable::schema request would return a null schema and fw will raise an exception.

…lter as yet

shivsood · 2019-07-29T16:49:59Z

MVP read and write path is in place now. I have a few issues/questions that i will add to org/apache/spark/sql/execution/datasources/v2/jdbc/Readme.md and start some discussion on the mailing list.
@tengpeng @xianyin @priyanka-gomatam please review and contribute as relevant. All of you should contributor rights to this repo. Also please note #25291.

@rdblue @cloud-fan @gengliangwang @brkyvz . Not ready for complete review, but a directional review will help greatly ( is this on the right track?).

…ionSuite.scala Scemantics are TRUNCATE TABLE and then overwrite with new data. Existing table schema is preserved. Overwrite(w/o truncate) - Scaffolding in place. Utils::CreateTable is dummy. Still need to be implemented. Scemantics are DROP TABLE, CREATE TABLE with new passed schema and then overwrite with new data. Problems - FW keep calling WriteBuilder::truncate() even when the truncate option is not specified or truncate explicitly set to false. Test update with truncate=false. - Added test df.filter and then overwrite(w/o) truncate to only write set of rows that match filter. FW still calls truncate Read path fixed to return schema with pruned columns as suggsted in Scan::readSchema Select with pruned columns still does not work.

…and CREATE TABLE. Reuses JDBCUtils to DROP and CREATE. JDBCUtils had to be refactored to take schema rather than dataframe. Functions that Dataframe are retained V1 compatibility. The V2 implementtion is not e2e tested as FW continues to send truncate rather than overwrite. V1 Regression test following JDBCUtils change UnitTest (./build/mvn -pl :spark-sql_2.12 clean install) were run. Test passed with regular failures that are see on master branch also. Total number of tests run: 5896 Suites: completed 288, aborted 0 Tests: succeeded 5893, failed 3, canceled 1, ignored 45, pending 0 V1 Integration Test (./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12) were run and all passed Run completed in 36 seconds, 352 milliseconds. Total number of tests run: 22 Suites: completed 5, aborted 1 Tests: succeeded 22, failed 0, canceled 0, ignored 6, pending 0

shivsood · 2019-08-02T21:55:59Z

Have a first draft of DataSourceV2 based JDBC connector available now ( PR#25211) . The goal was a MVP implementation with support for batch read/write. I am looking forward for your review comments to help guide direction. Note that i am still understanding/addressing some issues. The plan, status issues is capture in the Readme.md

Summary of changes

V2 connector changes are under under sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc. The implementation heavily reuses infra provided by JDBCUtils.
JDBCUtils(sql/core/../datasources/jdbc/JdbcUtils.scala) file is refactored ( for few functions) to suite V2 needs.
E2E test cases are in external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

AmplabJenkins · 2019-09-16T18:09:24Z

Can one of the admins verify this patch?

github-actions · 2019-12-26T00:07:31Z

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

baibaichen · 2020-12-25T09:46:35Z

@shivsood any progress about this PR ?

shivsood added 5 commits July 15, 2019 21:30

scaffolding : basic read/write with jdbc connector

5f4c26b

first draft implemnentation of write(append) flow with datasourcev2. …

e238fc0

…E2E test case added in MssQLServerIntegrationSuite

Readme added with high level project work items and plan

2ef896a

Merge branch 'dsv2_jdbc' of https://github.com/shivsood/spark into ds…

7f274d0

…v2_jdbc

shivsood mentioned this pull request Jul 20, 2019

[SPARK-24907][SQL][WIP] Migrate JDBC DataSource to JDBCDataSourceV2 Read using DataSourceV2 API #21861

Closed

shivsood added 5 commits July 20, 2019 09:51

scaffolding : basic read/write with jdbc connector

77448ab

first draft implemnentation of write(append) flow with datasourcev2. …

675083f

…E2E test case added in MssQLServerIntegrationSuite

Readme added with high level project work items and plan

90e1ad9

Merge branch 'dsv2_jdbc' of https://github.com/shivsood/spark into ds…

4a4a69d

…v2_jdbc

hygiene fixes and comments clarifing that read implementation is dumm…

5ddc5e9

…y for now

shivsood changed the title ~~[SPARK-24907][SQL][WIP] DataSourceV2 based connector for JDBC~~ [WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC Jul 22, 2019

shivsood added 14 commits July 22, 2019 14:41

cleaned up write(append) implementation. Append will not create a tab…

57baf76

…le and only append if a table exist. If table does not exist, a dbtable::schema request would return a null schema and fw will raise an exception.

comments/questions on overwrite scematics and minor log fixes

7246534

first draft of read implementation. No support for partitioning or fi…

c2d2707

…lter as yet

scaffolding : basic read/write with jdbc connector

e15a9e6

first draft implemnentation of write(append) flow with datasourcev2. …

99f1a3b

…E2E test case added in MssQLServerIntegrationSuite

Readme added with high level project work items and plan

9e4d2d6

scaffolding : basic read/write with jdbc connector

72b949d

first draft implemnentation of write(append) flow with datasourcev2. …

f7b5e4d

…E2E test case added in MssQLServerIntegrationSuite

hygiene fixes and comments clarifing that read implementation is dumm…

4a53903

…y for now

cleaned up write(append) implementation. Append will not create a tab…

cc7af99

…le and only append if a table exist. If table does not exist, a dbtable::schema request would return a null schema and fw will raise an exception.

comments/questions on overwrite scematics and minor log fixes

247c72d

first draft of read implementation. No support for partitioning or fi…

508dbc4

…lter as yet

merge

cf3696a

cleaner more stuctured read implementation

9d4093f

dongjoon-hyun added the SQL label Jul 27, 2019

cloud-fan mentioned this pull request Jul 29, 2019

[SPARK-28554][SQL] implement basic catalog functionalities for JDBC v2 with a DS v1 fallback API #25291

Closed

shivsood force-pushed the dsv2_jdbc branch from d89def7 to d1e3142 Compare August 2, 2019 18:16

minor readme updates

38b80e2

github-actions bot added the Stale label Dec 26, 2019

github-actions bot closed this Dec 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC #25211

[WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC #25211

Uh oh!

shivsood commented Jul 20, 2019

Uh oh!

shivsood commented Jul 29, 2019

Uh oh!

shivsood commented Aug 2, 2019

Uh oh!

AmplabJenkins commented Sep 16, 2019

Uh oh!

github-actions bot commented Dec 26, 2019

Uh oh!

baibaichen commented Dec 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC #25211

[WIP][SPARK-24907][SQL] DataSourceV2 based connector for JDBC #25211

Uh oh!

Conversation

shivsood commented Jul 20, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

shivsood commented Jul 29, 2019

Uh oh!

shivsood commented Aug 2, 2019

Uh oh!

AmplabJenkins commented Sep 16, 2019

Uh oh!

github-actions bot commented Dec 26, 2019

Uh oh!

baibaichen commented Dec 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants