-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12922: [Java] Add flight-sql to the flight package. #10906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I think the FlightSQL proto itself is pretty much there, I left some comments. I also looked over the Java implementation and left some comments.
java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/StatementContext.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/FlightSqlExample.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/FlightSqlExample.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/FlightSqlExample.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/test/java/org/apache/arrow/flight/TestFlightSql.java
Show resolved
Hide resolved
I might have missed this, but lets add some comments that this experimental. i think for formal adoption of this protocol, we likely need an implementation in C++ and then a vote on the mailing list. |
@emkornfield - Updated the PR description indicating this is experimental, let me know if that's enough. |
Sorry, I think we should clarify in the proto file itself as well (assuming we want this to be merged before the other necessities for formal adoption are ready). |
Ah - thanks @emkornfield - do you have an example, or did you just want the proto file to say experimental somewhere? |
Probably the best thing to do would be to create our own custom option for annotating items as experimental and use that for all the types. If that is too much work, then I think just something in the proto file is fine for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I left a few more comments.
At this point, it looks like the spec itself is pretty much there, and the implementation has a bit more work. It might be good to kick this up to the mailing list again to make sure people see the new PR, and we can start thinking about a format vote.
java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlClient.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlClient.java
Show resolved
Hide resolved
java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlProducer.java
Outdated
Show resolved
Hide resolved
java/flight/flight-sql/src/test/java/org/apache/arrow/flight/TestFlightSql.java
Show resolved
Hide resolved
|
||
// Opaque handle for the prepared statement on the server. | ||
bytes prepared_statement_handle = 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the client handle not still necessary here and below? It's still used in the client code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was actually needed for CommandStatementQuery and not for CommandPreparedStatementQuery nor CommandPreparedStatementUpdate.
I mistakenly removed it from CommandStatementQuery in the last commit and now fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I suppose I don't see what the intended use or semantics of the field are. It's used to index into a cache, but that cache seems like it should be indexed by query, not by client handle (else if a client maliciously or accidentally reuses a handle they can get the wrong FlightInfo). And if it's unique, then why cache anything anyways? It'll never come up again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lidavidm ,
Just to make sure, we are actually talking about CommandStatementQuery
, right?
If so, we could not index the cache by using the query itself, as it can have the same query running concurrently.
What was confusing about this is that we are using CommandStatementQuery as both the command in getFlightInfo (which needs the query) and ticket when calling getStream (which needs the handle).
I think a solution to that is to split CommandStatementQuery into two messages:
message CommandStatementQuery {
option (experimental) = true;
// The SQL syntax.
string query = 1;
}
message TicketStatementQuery {
option (experimental) = true;
// Unique identifier for the instance of the statement to execute.
bytes statement_handle = 1;
}
CommandStatementQuery
would be used by the client when calling getFlightInfo, the server would create and cache the statement and return a FlightInfo with a ticket containing TicketStatementQuery
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I see now, thanks. Yes, this is the intent of the Ticket, and I agree splitting messages like this makes more sense. Also, I think it should be stated (at least for now) that clients should not attempt to parse TicketStatementQuery / clients should treat the Ticket payload as opaque.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidavidm , we just made this change! Can you take a look please? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've kicked off CI, I will take a look later this evening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. On the format side, I think things look good. (I left two minor nits just about TODOs and inconsistencies.) Let's give people a bit of time to review things here, but I think we should start a vote on the format additions this coming week.
During/after that, the Java implementation here can be polished (I still see some TODOs) and reviewed more thoroughly. I might ask to split the format additions into its own PR, so it can be merged after the vote, and then we can focus on the Java side.
Also something worth thinking about is if the old spec document can be updated and included as part of the Sphinx docs in the repo, since it included useful information like the request sequences, and would also serve as a more accessible introduction to FlightSQL. It doesn't have to be done here, though, and actually I can offer to help put that together after the fact.
format/FlightSql.proto
Outdated
* More information types can be added in future releases. | ||
* E.g. more SQL syntax support types, scalar functions support, type conversion support etc. | ||
* | ||
* // TODO: Flesh out the available set of metadata below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to remove the TODO? Or replace it with a note that the set of metadata may expand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another pass through the Java code and left some questions; I also noticed that the Schemas here aren't being IPC encapsulated (very common mistake since the obvious Java methods aren't clear on this point).
* | ||
* @param query The query to execute. | ||
* @param options RPC-layer hints for this call. | ||
* @return a FlightInfo object representing the stream(s) to fetch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @return a FlightInfo object representing the stream(s) to fetch. | |
* @return the number of rows affected. |
* @param options RPC-layer hints for this call. | ||
*/ | ||
public SchemaResult getSchema(FlightDescriptor descriptor, CallOption... options) { | ||
return client.getSchema(descriptor, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't really see the value of having getSchema/getStream since they just wrap the FlightClient method with no other changes. It would make more sense if this class was intended to completely wrap FlightClient, but this class doesn't manage the lifecycle of the underlying client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were added to avoid having to go back to the underlying FlightClient for most things, perhaps the FlightSqlClient should fully implement and manage the lifecycle of the underlying FlightClient. We were debating this, so happy for suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either FlightSqlClient can keep these methods, close the FlightClient it contains, and perhaps offer a getter to retrieve the underlying client; or it can drop these methods and continue not closing the FlightClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way it would also be clear that some Flight methods are not quite relevant (e.g. DoExchange).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose simply adding a close to the FlightSqlClient and closing the FlightClient, and not exposing a getter for the underlying FlightClient. The idea would be to fully rely on the FlightSqlClient now, so adding a getter seems to me to allow for messing more with the underlying client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I suppose the client can do any configuration (middleware, auth, etc.) before creating the FlightSqlClient in any case.
if (bytes.isEmpty()) { | ||
return new Schema(Collections.emptyList()); | ||
} | ||
resultSetSchema = Schema.deserialize(bytes.asReadOnlyByteBuffer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema.deserialize assumes the input is the raw Schema flatbuffer and not the IPC-encapsulated message as laid out in FlightSQL.proto, this should probably be MessageSerializer.deseralizeSchema(Message.getRootAsMessage(bytes.asReadOnlyByteBuffer()))
.
bytes = ByteString.EMPTY; | ||
} else { | ||
bytes = ByteString.copyFrom( | ||
jdbcToArrowSchema(metaData, DEFAULT_CALENDAR).toByteArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this serializes the schema to a raw flatbuffer and doesn't IPC-encapsulate it - this should be MessageSerializer.serializeMetadata(schema)
.
} | ||
final ActionCreatePreparedStatementResult result = ActionCreatePreparedStatementResult.newBuilder() | ||
.setDatasetSchema(bytes) | ||
.setParameterSchema(copyFrom(parameterSchema.toByteArray())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue applies here.
if (bytes.isEmpty()) { | ||
return new Schema(Collections.emptyList()); | ||
} | ||
parameterSchema = Schema.deserialize(bytes.asReadOnlyByteBuffer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue applies here.
for (int index = 0; index < rows; index++) { | ||
final String tableName = tableNameVector.getObject(index).toString(); | ||
final Schema schema = new Schema(tableToFields.get(tableName)); | ||
saveToVector(schema.toByteArray(), tableSchemaVector, index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to IPC-encapsulate the schema.
Field.nullable("key_sequence", MinorType.INT.getType()), | ||
Field.nullable("key_name", MinorType.VARCHAR.getType())); | ||
|
||
return new SchemaResult(new Schema(fields)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useful letting this be overridable vs just a static final constant or static method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a lot of the branches in getSchema could just be handled inline instead of calling out here. Not that it makes a big difference runtime-wise, but it would reduce the surface of this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on reducing the surface, but can probably replace with statics or something of the sort. I wouldn't see a case where you'd want to override, except some people add extra columns for more information. If possible, I'd like to avoid that here as it encourages implementing for specifics instead of generals on the client-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statics or something sounds fine. I agree we shouldn't encourage extending the protocol ad-hoc.
@lidavidm I think everything has been addressed, let me know if not as we're moving on to the C++ implementation. |
@kylepbit on the format side, yes, I think everything is good. Of course, feel free to ping me once the C++ client is up. |
9dcec43
to
aa6cafe
Compare
* Utilities to work with Flight SQL semantics. | ||
*/ | ||
public final class FlightSqlUtils { | ||
public static final ActionType FLIGHT_SQL_CREATEPREPAREDSTATEMENT = new ActionType("CreatePreparedStatement", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it CREATE_PREPARED_STATEMENT? (seems like this constant and the one below are missing underscores between worded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. This looks good to me.
format/FlightSql.proto
Outdated
* > | ||
* The returned data should be ordered by fk_catalog_name, fk_schema_name, fk_table_name, fk_key_name, then key_sequence. | ||
* update_rule and delete_rule returns a byte that is equivalent to actions: | ||
* - 0 = CASCADE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have this be a dictionary encoded arrow column? If we think that is overkill, I think having an enum defined in protobuf so it is easy to avoid bug in mapping the action type.
format/FlightSql.proto
Outdated
option (experimental) = true; | ||
|
||
// Specifies the catalog to search for the foreign key table. | ||
google.protobuf.StringValue catalog = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is StringValue necessary here? Is it useful to distinguish between not set and empty string in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we wanted to distinguish just like JDBC does: "" retrieves those without a catalog; null means that the catalog name should not be used to narrow the search
- source: https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getExportedKeys-java.lang.String-java.lang.String-java.lang.String-
We should clarify this in the documentation
format/FlightSql.proto
Outdated
option (experimental) = true; | ||
|
||
// The SQL syntax. | ||
// The query should be treated as an opaque value, that is, clients should not attempt to parse this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious on why the warning that clients shouldn't attempt to parse it?
Is this because the SQL query could be any SQL dialect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I think that was meant to go below: https://github.com/apache/arrow/pull/10906/files#r692388125
The ticket shouldn't be parsed since it's private to the server. Of course, the client can parse this field here since it's the one generating it in the first place.
} | ||
|
||
/* | ||
* Wrap the result of a "GetPreparedStatement" action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For resource like this did we ever clarify requirements around close? Or are the semantics left up to the server? We should probably document one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prepared statement is only closed when the client calls the ClosePreparedStatement action.
Do you think it should be mentioned in this action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention that the prepared statement should be automatically closed by the server if not used after some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should or may?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should - given that the close is a client-side initiated action, the server should clean up unused statements as it cannot depend upon the close occurring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there's possibly also scenario where the server doesn't hold resources for a prepared statement (for example, if the handle were self contained), and it seems to me it's a case where the client SHOULD close the prepared statement but where the server MAY close it for external reason (low on resources, timeout, etc...)
* | ||
* Initially, Flight SQL will support the following information types: | ||
* - Server Information - Range [0-500) | ||
* - Syntax Information - Range [500-1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that server and syntax information are separated into two different ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason for the separation aside from categorization and keeping the grouping roughly together, it allows you to better mentally group them but programatically there’s no reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically there could have been since protobuf uses varint so maybe it could have been a case of using lower values for the most frequently accessed info and to create more compact messages?
format/FlightSql.proto
Outdated
* - Syntax Information - Range [500-1000) | ||
* Range [0-10,000) is reserved for defaults. Custom options should start at 10,000. | ||
* | ||
* 1. Server Information [0-500): Provides basic information about the Flight SQL Server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providing a separate enum definition for reserved values wold be useful here.
format/FlightSql.proto
Outdated
* < | ||
* catalog_name: utf8, | ||
* schema_name: utf8, | ||
* table_name: utf8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be not-null? (I think I'm OK leaving it nullable but we might want o list which fields are required or not in the schemas).
@@ -0,0 +1,454 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment. Instead of StringValue we should use "optional string field = 1". Proto3 was revised to support this for checking for presence
Its not clear to me if optional should be used in all cases (it might be sufficient to denote empty as interpreted as not specified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your review @emkornfield !
We tried to use optional
before, but it is not available in the protobuf version the project is using (3.7.1), I wonder if it's better to keep it using StringValue or update the library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to update the library. This might be painful on the C++ side but I think it is the right thing to do in the long term. @lidavidm thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be a little worried about supporting LTS releases of Ubuntu and such. It does seem to be in conda-forge already, at least.
And we already vary our Protobuf version based on features requested:
arrow/cpp/cmake_modules/ThirdpartyToolchain.cmake
Lines 1415 to 1424 in 04575bc
if(ARROW_WITH_GRPC) | |
# gRPC 1.21.0 or later require Protobuf 3.7.0 or later. | |
set(ARROW_PROTOBUF_REQUIRED_VERSION "3.7.0") | |
elseif(ARROW_GANDIVA_JAVA) | |
# google::protobuf::MessageLite::ByteSize() is deprecated since | |
# Protobuf 3.4.0. | |
set(ARROW_PROTOBUF_REQUIRED_VERSION "3.4.0") | |
else() | |
set(ARROW_PROTOBUF_REQUIRED_VERSION "2.6.1") | |
endif() |
We may also have to bump Protobuf if/when we upgrade gRPC, though, I cannot figure out what exactly gRPC requires even after some searching. It seems v1.38 uses Protobuf 3.15 though it is not clear if that is a hard floor or not.
@kszucs how much trouble would it be to bump our Protobuf version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou can probably tell. I'd say that we should try to bump it, trigger the nightly builds and check where can we switch to bundled builds instead of using system protobuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no problem we require Protobuf 3.15 or later only when Apache Arrow Flight is enabled.
We just need to bump bundled Protobuf version for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both for chiming in, I filed https://issues.apache.org/jira/browse/ARROW-13760 and will give it a run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11006 was just merged thanks to Kou. You should be able to use the Protobuf optionals now on the C++ side (you should be able to bump the Java version easily enough?)
java/pom.xml
Outdated
<groupId>org.hamcrest</groupId> | ||
<artifactId>hamcrest</artifactId> | ||
<version>2.2</version> | ||
<scope>test</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the convention to leave the default scope for dependencyManagement
and specify it on usage if not compile? (so that when used it's clear what the scope is by reading pom.xml
)?
|
||
<packaging>pom</packaging> | ||
|
||
<modules> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably not a good idea to define modules here and also at the top level.
return false; | ||
} | ||
final StatementContext<?> that = (StatementContext<?>) other; | ||
return getStatement().equals(that.getStatement()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the class is final (which is good), there's no need to call the method and the field can directly be used...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(looks like it's been resolved but cannot close that thread)
* | ||
* @return the SQL query if present; empty otherwise. | ||
*/ | ||
public Optional<String> getQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this method returns an optional, why the constructor doesn't accept an optional too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(looks like it's been resolved but cannot close that thread)
@Override | ||
public void close() throws Exception { | ||
Connection connection = statement.getConnection(); | ||
AutoCloseables.close(statement, connection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems potentially confusing that StatementContext
to close both statement
and connection
. Shouldn't connection
be closed transitively when statement
is closed too instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but I am not sure if it's possible to make the Statement
close the Connection
, those are being implemented by Derby (part of the example implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurentgo - when you say closed transitively, are you thinking that closing the JDBC Statement
will also close the Connection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I guess I misread the code a bit and I can see the lifecycle model is that a connection + a statement are created for each StatementContext
instance being created.
On an unrelated note, because caches are separate, it seems it is possible for a statement to be evicted which would in turn invalid the resultset cached separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurentgo , you are right, I managed to remove the cache ResultSets now
|
||
listener.putNext(); | ||
listener.completed(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it also means that the same prepared statement cannot be used concurrently with multiple set of parameters?
final VarCharVector tableTypeVector = new VarCharVector("table_type", allocator); | ||
|
||
final List<FieldVector> vectors = | ||
new ArrayList<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wrapping the immutable list into an array list? Is vectors supposed to be modifiable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular method - yes.
If includeSchema == true
then the VectorSchemaRoot will have an additional table_schema
vector
* | ||
* Initially, Flight SQL will support the following information types: | ||
* - Server Information - Range [0-500) | ||
* - Syntax Information - Range [500-1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically there could have been since protobuf uses varint so maybe it could have been a case of using lower values for the most frequently accessed info and to create more compact messages?
<pluginId>grpc-java</pluginId> | ||
<pluginArtifact>io.grpc:protoc-gen-grpc-java:${dep.grpc.version}:exe:${os.detected.classifier}</pluginArtifact> | ||
</configuration> | ||
<executions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't the protobuf classes been already generated by flight-core?
* | ||
* @return a FlightInfo object representing the stream(s) to fetch. | ||
*/ | ||
public FlightInfo getSqlInfo(final int... info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a Java developer, I would prefer a strongly typed parameter vs simply passing int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurentgo , I would think an overload for this method with final FlightSql.SqlInfo... info
parameters would be better, but there is still the case in which the client can pass custom values that would not be in the enum SqlInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that scenario, one possible design pattern is to define an interface like this:
interface SqlInfo {
String name();
int code();
}
, use it as the argument type, and have the enum implements the interface for standard values.
But it also begs the question of allowing custom values or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we made the decision early on to allow custom values, given that some data sources will have them and we're unlikely to provide support for each and every one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, is the interface + enum for standard values a reasonable compromise?
* Implement SqlInfoProvider helper class * Added further javadocs to SqlInfoBuilder * Properly links the SqlInfoBuilder Javadocs to the SqlInfo one Co-authored-by: Vinicius Fraga <sxvinifp@gmail.com>
e1fb1f3
to
d5cc2bc
Compare
* Refactor reference to schema in the database to db_schema_* * Add tableRef class and refactor call from client to use it * Remove tableRef from GetTables * Fix checkstyle issues * Set fields as final in the TableRef
It looks like the Protobuf here needs to be updated with the changes in #11507. |
It might be good to add a status or link to a new PR. When one Googles "arrow flight sql jdbc driver", the top result is this article which references this PR. As the public comes to this PR and sees that it was closed, I think the assumption will be that work on Arrow Flight JDBC has stopped. |
I would say that's on Dremio to keep blog posts up to date (and until recently, all PRs were 'closed' due to how GitHub treats things - this PR is actually merged) |
I also don't really want to modify a PR after it's been merged, not sure what we can do besides adding a comment down here |
Thanks, the clarification on "Closed" meaning "Merged" is certainly helpful. Is there a current issue or PR to track? |
Flight SQL was merged and is documented here: https://arrow.apache.org/docs/format/FlightSql.html The JDBC driver is at #13800 and is in need of review, I have been trying to get CI and such passing |
Awesome, thank you! |
Introduction
This experimental PR implements Flight SQL, which formalizes SQL semantics on top of Flight. This follows designs and is a continuation of a PR started here. You can find the original proposal here, although the document has since drifted from the actual implementation.
An Overview of this PR
This PR adds a new module within Flight called flight-sql.
flight-sql is a new Flight API that provides a standard way for clients and servers to communicate with SQL-like semantics.
Like other Flight APIs, flight-sql does not provide implementation details that dictate how a client and server communicates with each other, it simply provides the SQL semantics and apply them onto the Flight API.
A Walkthrough of the New Module
FlightSql.proto introduces new SQL protobuf objects.
FlightSqlClient introduces a new wrapper for a FlightClient that adds the Flight SQL semantics on the client side.
FlightSqlProducer introduces a new FlightProducer API that adapts classic Flight requests into SQL operations.
FlighSqlExample is a sample FlightSQL server implementation.
Note that there are likely a few remaining items to be fleshed out, but they mostly pertain to metadata and adding to the list of formally specified metadata items. Also, this is experimental and has not formally been adopted yet.