Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FlightRPC][Java] Implement Flight SQL JDBC Driver #23982

Closed
asfimport opened this issue Feb 2, 2020 · 6 comments
Closed

[FlightRPC][Java] Implement Flight SQL JDBC Driver #23982

asfimport opened this issue Feb 2, 2020 · 6 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Feb 2, 2020

As a Java developer, I would like the ability to use JDBC to interact with Flight servers. For example, there is now an example in the Arrow repo to run a Flight server wrapping DataFusion and it supports executing SQL against CSV and Parquet files. I would like to be able to call this from Java.

A flight Arrow JDBC driver would also then simplify developing integrations with other Apache projects, such as building a Spark V2 Data Source or a Drill storage plugin. It would also be directly usable from many BI tools.

I propose that the class name of the driver should be "org.apache.arrow.jdbc.Driver" and the connection string should be "jdbc:arrow://host:port?[properties]". I'm purposely leaving "flight" out of these because I don't think it makes sense to support multiple protocols now that we have flight and it is easier for users to remember "arrow" rather than needing to know about the protocol. This is easy to change if there are objections.

JDBC is designed around sending queries as strings and then receiving results. These strings could be SQL queries, JSON-encoded query plans, or something else. The JDBC driver will not make any assumptions about the format or dialect of these strings. Queries would be executed using the "DoGet" method.

The JDBC metadata functionality for reading schema information could possibly use ListFlights but I haven't looked into this part yet.

I do expect that this JDBC driver will serve as a base that could be extended to add specific functionality for different Flight servers rather than attempt to support them all.

 

 

 

 

 

Reporter: Andy Grove / @andygrove
Assignee: James Duong / @jduo

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-7744. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Jacques Nadeau / @jacques-n:
Given my previous experience with these APIs, I suggest you use Avatica as the basis for this rather than implementing by hand. I noticed you haven't done that in your WIP. Was it something you considered?

@asfimport
Copy link
Collaborator Author

Andy Grove / @andygrove:
Hi Jacques,

I was wary of adding more dependencies unless/until they are really needed.
I've implemented production JDBC drivers before, and there is definitely a
bit of tedious work involved in implementing the result set type conversion
code and some of the associated metadata functionality but my gut feeling
so far is that the long term burden would be less than designing around
Avatica. Avatica seems to provide much more than we need with a server
process, wire protocol, etc. It also has its own type system so we would
have to convert between Avatica and Arrow types. It seems preferable to
design this from the ground up based on Arrow types? I also was not able to
find comprehensive documentation for building a JDBC driver like this with
Avatica, which concerned me.

I guess I could try and do a mini bake-off here and create a PR based on
Avatica as well so we can compare the approaches. I can also ask some
questions on the appropriate mailing list about Avatica's suitability for
this use case.

Thanks,

Andy.

On Mon, Feb 3, 2020 at 2:58 PM Jacques Nadeau (Jira) jira@apache.org

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
I haven't seen activity on this so removing from the 1.0.0 milestone for now

@asfimport
Copy link
Collaborator Author

John Kew:
Given my experience as well; I think a pure java/flight/arrow version would be preferred, if only to reduce the number of dependencies. Primarily the  advantage here of a JDBC driver is to explore lots of integration scenarios; so I suspect a light-weight approach would be ideal. While I really like calcite, for the purposes of flight I may explore Andy's original CL here as a start within Tableau as a wrapped up connector plugin to see what can be made of it.

@asfimport
Copy link
Collaborator Author

Andy Grove / @andygrove:
I am still interested in exploring this but have unassigned myself for now because it will be a long time before I can get to this.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
Issue resolved by pull request 13800
#13800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant