Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-12922: [Java] Add flight-sql to the flight package. #10906

Closed
wants to merge 248 commits into from

Conversation

kylepbit
Copy link

@kylepbit kylepbit commented Aug 10, 2021

Introduction
This experimental PR implements Flight SQL, which formalizes SQL semantics on top of Flight. This follows designs and is a continuation of a PR started here. You can find the original proposal here, although the document has since drifted from the actual implementation.

An Overview of this PR
This PR adds a new module within Flight called flight-sql.
flight-sql is a new Flight API that provides a standard way for clients and servers to communicate with SQL-like semantics.
Like other Flight APIs, flight-sql does not provide implementation details that dictate how a client and server communicates with each other, it simply provides the SQL semantics and apply them onto the Flight API.

A Walkthrough of the New Module
FlightSql.proto introduces new SQL protobuf objects.
FlightSqlClient introduces a new wrapper for a FlightClient that adds the Flight SQL semantics on the client side.
FlightSqlProducer introduces a new FlightProducer API that adapts classic Flight requests into SQL operations.
FlighSqlExample is a sample FlightSQL server implementation.

Note that there are likely a few remaining items to be fleshed out, but they mostly pertain to metadata and adding to the list of formally specified metadata items. Also, this is experimental and has not formally been adopted yet.

@github-actions
Copy link

@lidavidm lidavidm self-requested a review August 10, 2021 19:48
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I think the FlightSQL proto itself is pretty much there, I left some comments. I also looked over the Java implementation and left some comments.

format/FlightSql.proto Outdated Show resolved Hide resolved
format/FlightSql.proto Outdated Show resolved Hide resolved
format/FlightSql.proto Outdated Show resolved Hide resolved
format/FlightSql.proto Outdated Show resolved Hide resolved
format/FlightSql.proto Outdated Show resolved Hide resolved
@emkornfield
Copy link
Contributor

I might have missed this, but lets add some comments that this experimental. i think for formal adoption of this protocol, we likely need an implementation in C++ and then a vote on the mailing list.

@kylepbit
Copy link
Author

@emkornfield - Updated the PR description indicating this is experimental, let me know if that's enough.
@lidavidm - Working with team to address items you raised.

@emkornfield
Copy link
Contributor

Sorry, I think we should clarify in the proto file itself as well (assuming we want this to be merged before the other necessities for formal adoption are ready).

@kylepbit
Copy link
Author

Ah - thanks @emkornfield - do you have an example, or did you just want the proto file to say experimental somewhere?

@emkornfield
Copy link
Contributor

Probably the best thing to do would be to create our own custom option for annotating items as experimental and use that for all the types. If that is too much work, then I think just something in the proto file is fine for now.

format/FlightSql.proto Outdated Show resolved Hide resolved
format/FlightSql.proto Outdated Show resolved Hide resolved
format/FlightSql.proto Outdated Show resolved Hide resolved
@lidavidm lidavidm self-requested a review August 17, 2021 17:29
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I left a few more comments.

At this point, it looks like the spec itself is pretty much there, and the implementation has a bit more work. It might be good to kick this up to the mailing list again to make sure people see the new PR, and we can start thinking about a format vote.

format/FlightSql.proto Outdated Show resolved Hide resolved

// Opaque handle for the prepared statement on the server.
bytes prepared_statement_handle = 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the client handle not still necessary here and below? It's still used in the client code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was actually needed for CommandStatementQuery and not for CommandPreparedStatementQuery nor CommandPreparedStatementUpdate.
I mistakenly removed it from CommandStatementQuery in the last commit and now fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I suppose I don't see what the intended use or semantics of the field are. It's used to index into a cache, but that cache seems like it should be indexed by query, not by client handle (else if a client maliciously or accidentally reuses a handle they can get the wrong FlightInfo). And if it's unique, then why cache anything anyways? It'll never come up again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lidavidm ,

Just to make sure, we are actually talking about CommandStatementQuery, right?
If so, we could not index the cache by using the query itself, as it can have the same query running concurrently.

What was confusing about this is that we are using CommandStatementQuery as both the command in getFlightInfo (which needs the query) and ticket when calling getStream (which needs the handle).

I think a solution to that is to split CommandStatementQuery into two messages:

message CommandStatementQuery {
  option (experimental) = true;

  // The SQL syntax.
  string query = 1;
}

message TicketStatementQuery {
  option (experimental) = true;
  
  // Unique identifier for the instance of the statement to execute.
  bytes statement_handle = 1;
}

CommandStatementQuery would be used by the client when calling getFlightInfo, the server would create and cache the statement and return a FlightInfo with a ticket containing TicketStatementQuery .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I see now, thanks. Yes, this is the intent of the Ticket, and I agree splitting messages like this makes more sense. Also, I think it should be stated (at least for now) that clients should not attempt to parse TicketStatementQuery / clients should treat the Ticket payload as opaque.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm , we just made this change! Can you take a look please? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've kicked off CI, I will take a look later this evening.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. On the format side, I think things look good. (I left two minor nits just about TODOs and inconsistencies.) Let's give people a bit of time to review things here, but I think we should start a vote on the format additions this coming week.

During/after that, the Java implementation here can be polished (I still see some TODOs) and reviewed more thoroughly. I might ask to split the format additions into its own PR, so it can be merged after the vote, and then we can focus on the Java side.

Also something worth thinking about is if the old spec document can be updated and included as part of the Sphinx docs in the repo, since it included useful information like the request sequences, and would also serve as a more accessible introduction to FlightSQL. It doesn't have to be done here, though, and actually I can offer to help put that together after the fact.

format/FlightSql.proto Outdated Show resolved Hide resolved
* More information types can be added in future releases.
* E.g. more SQL syntax support types, scalar functions support, type conversion support etc.
*
* // TODO: Flesh out the available set of metadata below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to remove the TODO? Or replace it with a note that the set of metadata may expand.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another pass through the Java code and left some questions; I also noticed that the Schemas here aren't being IPC encapsulated (very common mistake since the obvious Java methods aren't clear on this point).

*
* @param query The query to execute.
* @param options RPC-layer hints for this call.
* @return a FlightInfo object representing the stream(s) to fetch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return a FlightInfo object representing the stream(s) to fetch.
* @return the number of rows affected.

* @param options RPC-layer hints for this call.
*/
public SchemaResult getSchema(FlightDescriptor descriptor, CallOption... options) {
return client.getSchema(descriptor, options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't really see the value of having getSchema/getStream since they just wrap the FlightClient method with no other changes. It would make more sense if this class was intended to completely wrap FlightClient, but this class doesn't manage the lifecycle of the underlying client.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were added to avoid having to go back to the underlying FlightClient for most things, perhaps the FlightSqlClient should fully implement and manage the lifecycle of the underlying FlightClient. We were debating this, so happy for suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either FlightSqlClient can keep these methods, close the FlightClient it contains, and perhaps offer a getter to retrieve the underlying client; or it can drop these methods and continue not closing the FlightClient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way it would also be clear that some Flight methods are not quite relevant (e.g. DoExchange).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose simply adding a close to the FlightSqlClient and closing the FlightClient, and not exposing a getter for the underlying FlightClient. The idea would be to fully rely on the FlightSqlClient now, so adding a getter seems to me to allow for messing more with the underlying client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I suppose the client can do any configuration (middleware, auth, etc.) before creating the FlightSqlClient in any case.

if (bytes.isEmpty()) {
return new Schema(Collections.emptyList());
}
resultSetSchema = Schema.deserialize(bytes.asReadOnlyByteBuffer());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema.deserialize assumes the input is the raw Schema flatbuffer and not the IPC-encapsulated message as laid out in FlightSQL.proto, this should probably be MessageSerializer.deseralizeSchema(Message.getRootAsMessage(bytes.asReadOnlyByteBuffer())).

bytes = ByteString.EMPTY;
} else {
bytes = ByteString.copyFrom(
jdbcToArrowSchema(metaData, DEFAULT_CALENDAR).toByteArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this serializes the schema to a raw flatbuffer and doesn't IPC-encapsulate it - this should be MessageSerializer.serializeMetadata(schema).

}
final ActionCreatePreparedStatementResult result = ActionCreatePreparedStatementResult.newBuilder()
.setDatasetSchema(bytes)
.setParameterSchema(copyFrom(parameterSchema.toByteArray()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue applies here.

if (bytes.isEmpty()) {
return new Schema(Collections.emptyList());
}
parameterSchema = Schema.deserialize(bytes.asReadOnlyByteBuffer());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue applies here.

for (int index = 0; index < rows; index++) {
final String tableName = tableNameVector.getObject(index).toString();
final Schema schema = new Schema(tableToFields.get(tableName));
saveToVector(schema.toByteArray(), tableSchemaVector, index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to IPC-encapsulate the schema.

Field.nullable("key_sequence", MinorType.INT.getType()),
Field.nullable("key_name", MinorType.VARCHAR.getType()));

return new SchemaResult(new Schema(fields));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful letting this be overridable vs just a static final constant or static method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a lot of the branches in getSchema could just be handled inline instead of calling out here. Not that it makes a big difference runtime-wise, but it would reduce the surface of this interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on reducing the surface, but can probably replace with statics or something of the sort. I wouldn't see a case where you'd want to override, except some people add extra columns for more information. If possible, I'd like to avoid that here as it encourages implementing for specifics instead of generals on the client-side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statics or something sounds fine. I agree we shouldn't encourage extending the protocol ad-hoc.

@kylepbit
Copy link
Author

@lidavidm I think everything has been addressed, let me know if not as we're moving on to the C++ implementation.

@lidavidm
Copy link
Member

@kylepbit on the format side, yes, I think everything is good. Of course, feel free to ping me once the C++ client is up.

* Utilities to work with Flight SQL semantics.
*/
public final class FlightSqlUtils {
public static final ActionType FLIGHT_SQL_CREATEPREPAREDSTATEMENT = new ActionType("CreatePreparedStatement",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it CREATE_PREPARED_STATEMENT? (seems like this constant and the one below are missing underscores between worded.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. This looks good to me.

* >
* The returned data should be ordered by fk_catalog_name, fk_schema_name, fk_table_name, fk_key_name, then key_sequence.
* update_rule and delete_rule returns a byte that is equivalent to actions:
* - 0 = CASCADE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have this be a dictionary encoded arrow column? If we think that is overkill, I think having an enum defined in protobuf so it is easy to avoid bug in mapping the action type.

option (experimental) = true;

// Specifies the catalog to search for the foreign key table.
google.protobuf.StringValue catalog = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is StringValue necessary here? Is it useful to distinguish between not set and empty string in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we wanted to distinguish just like JDBC does: "" retrieves those without a catalog; null means that the catalog name should not be used to narrow the search - source: https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getExportedKeys-java.lang.String-java.lang.String-java.lang.String-

We should clarify this in the documentation

option (experimental) = true;

// The SQL syntax.
// The query should be treated as an opaque value, that is, clients should not attempt to parse this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious on why the warning that clients shouldn't attempt to parse it?

Is this because the SQL query could be any SQL dialect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think that was meant to go below: https://github.com/apache/arrow/pull/10906/files#r692388125

The ticket shouldn't be parsed since it's private to the server. Of course, the client can parse this field here since it's the one generating it in the first place.

}

/*
* Wrap the result of a "GetPreparedStatement" action.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For resource like this did we ever clarify requirements around close? Or are the semantics left up to the server? We should probably document one way or another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prepared statement is only closed when the client calls the ClosePreparedStatement action.
Do you think it should be mentioned in this action?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention that the prepared statement should be automatically closed by the server if not used after some time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should or may?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should - given that the close is a client-side initiated action, the server should clean up unused statements as it cannot depend upon the close occurring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there's possibly also scenario where the server doesn't hold resources for a prepared statement (for example, if the handle were self contained), and it seems to me it's a case where the client SHOULD close the prepared statement but where the server MAY close it for external reason (low on resources, timeout, etc...)

*
* Initially, Flight SQL will support the following information types:
* - Server Information - Range [0-500)
* - Syntax Information - Range [500-1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that server and syntax information are separated into two different ranges?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for the separation aside from categorization and keeping the grouping roughly together, it allows you to better mentally group them but programatically there’s no reason

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically there could have been since protobuf uses varint so maybe it could have been a case of using lower values for the most frequently accessed info and to create more compact messages?

* - Syntax Information - Range [500-1000)
* Range [0-10,000) is reserved for defaults. Custom options should start at 10,000.
*
* 1. Server Information [0-500): Provides basic information about the Flight SQL Server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

providing a separate enum definition for reserved values wold be useful here.

* <
* catalog_name: utf8,
* schema_name: utf8,
* table_name: utf8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be not-null? (I think I'm OK leaving it nullable but we might want o list which fields are required or not in the schemas).

@@ -0,0 +1,454 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment. Instead of StringValue we should use "optional string field = 1". Proto3 was revised to support this for checking for presence

Its not clear to me if optional should be used in all cases (it might be sufficient to denote empty as interpreted as not specified).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your review @emkornfield !

We tried to use optional before, but it is not available in the protobuf version the project is using (3.7.1), I wonder if it's better to keep it using StringValue or update the library?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to update the library. This might be painful on the C++ side but I think it is the right thing to do in the long term. @lidavidm thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be a little worried about supporting LTS releases of Ubuntu and such. It does seem to be in conda-forge already, at least.

And we already vary our Protobuf version based on features requested:

if(ARROW_WITH_GRPC)
# gRPC 1.21.0 or later require Protobuf 3.7.0 or later.
set(ARROW_PROTOBUF_REQUIRED_VERSION "3.7.0")
elseif(ARROW_GANDIVA_JAVA)
# google::protobuf::MessageLite::ByteSize() is deprecated since
# Protobuf 3.4.0.
set(ARROW_PROTOBUF_REQUIRED_VERSION "3.4.0")
else()
set(ARROW_PROTOBUF_REQUIRED_VERSION "2.6.1")
endif()
so I suppose this wouldn't change the situation there (20.04 LTS ships Protobuf 3.6 anyways).

We may also have to bump Protobuf if/when we upgrade gRPC, though, I cannot figure out what exactly gRPC requires even after some searching. It seems v1.38 uses Protobuf 3.15 though it is not clear if that is a hard floor or not.

@kszucs how much trouble would it be to bump our Protobuf version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou can probably tell. I'd say that we should try to bump it, trigger the nightly builds and check where can we switch to bundled builds instead of using system protobuf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no problem we require Protobuf 3.15 or later only when Apache Arrow Flight is enabled.
We just need to bump bundled Protobuf version for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for chiming in, I filed https://issues.apache.org/jira/browse/ARROW-13760 and will give it a run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11006 was just merged thanks to Kou. You should be able to use the Protobuf optionals now on the C++ side (you should be able to bump the Java version easily enough?)

java/pom.xml Outdated
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest</artifactId>
<version>2.2</version>
<scope>test</scope>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the convention to leave the default scope for dependencyManagement and specify it on usage if not compile? (so that when used it's clear what the scope is by reading pom.xml)?


<packaging>pom</packaging>

<modules>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably not a good idea to define modules here and also at the top level.

return false;
}
final StatementContext<?> that = (StatementContext<?>) other;
return getStatement().equals(that.getStatement());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the class is final (which is good), there's no need to call the method and the field can directly be used...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(looks like it's been resolved but cannot close that thread)

*
* @return the SQL query if present; empty otherwise.
*/
public Optional<String> getQuery() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this method returns an optional, why the constructor doesn't accept an optional too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(looks like it's been resolved but cannot close that thread)

@Override
public void close() throws Exception {
Connection connection = statement.getConnection();
AutoCloseables.close(statement, connection);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems potentially confusing that StatementContext to close both statement and connection. Shouldn't connection be closed transitively when statement is closed too instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I am not sure if it's possible to make the Statement close the Connection, those are being implemented by Derby (part of the example implementation)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurentgo - when you say closed transitively, are you thinking that closing the JDBC Statement will also close the Connection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I guess I misread the code a bit and I can see the lifecycle model is that a connection + a statement are created for each StatementContext instance being created.

On an unrelated note, because caches are separate, it seems it is possible for a statement to be evicted which would in turn invalid the resultset cached separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurentgo , you are right, I managed to remove the cache ResultSets now


listener.putNext();
listener.completed();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it also means that the same prepared statement cannot be used concurrently with multiple set of parameters?

final VarCharVector tableTypeVector = new VarCharVector("table_type", allocator);

final List<FieldVector> vectors =
new ArrayList<>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wrapping the immutable list into an array list? Is vectors supposed to be modifiable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular method - yes.
If includeSchema == true then the VectorSchemaRoot will have an additional table_schema vector

*
* Initially, Flight SQL will support the following information types:
* - Server Information - Range [0-500)
* - Syntax Information - Range [500-1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically there could have been since protobuf uses varint so maybe it could have been a case of using lower values for the most frequently accessed info and to create more compact messages?

<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:${dep.grpc.version}:exe:${os.detected.classifier}</pluginArtifact>
</configuration>
<executions>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't the protobuf classes been already generated by flight-core?

*
* @return a FlightInfo object representing the stream(s) to fetch.
*/
public FlightInfo getSqlInfo(final int... info) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a Java developer, I would prefer a strongly typed parameter vs simply passing int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurentgo , I would think an overload for this method with final FlightSql.SqlInfo... info parameters would be better, but there is still the case in which the client can pass custom values that would not be in the enum SqlInfo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that scenario, one possible design pattern is to define an interface like this:

  interface SqlInfo {
    String name();
    int code();
  }

, use it as the argument type, and have the enum implements the interface for standard values.

But it also begs the question of allowing custom values or not...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we made the decision early on to allow custom values, given that some data sources will have them and we're unlikely to provide support for each and every one of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, is the interface + enum for standard values a reasonable compromise?

vfraga and others added 3 commits October 28, 2021 17:31
* Refactor reference to schema in the database to db_schema_*

* Add tableRef class and refactor call from client to use it

* Remove tableRef from GetTables

* Fix checkstyle issues

* Set fields as final in the TableRef
@lidavidm
Copy link
Member

lidavidm commented Dec 6, 2021

It looks like the Protobuf here needs to be updated with the changes in #11507.

@avantgardnerio
Copy link

It might be good to add a status or link to a new PR. When one Googles "arrow flight sql jdbc driver", the top result is this article which references this PR.

As the public comes to this PR and sees that it was closed, I think the assumption will be that work on Arrow Flight JDBC has stopped.

@lidavidm
Copy link
Member

I would say that's on Dremio to keep blog posts up to date (and until recently, all PRs were 'closed' due to how GitHub treats things - this PR is actually merged)

@lidavidm
Copy link
Member

I also don't really want to modify a PR after it's been merged, not sure what we can do besides adding a comment down here

@avantgardnerio
Copy link

Thanks, the clarification on "Closed" meaning "Merged" is certainly helpful. Is there a current issue or PR to track?

@lidavidm
Copy link
Member

Flight SQL was merged and is documented here: https://arrow.apache.org/docs/format/FlightSql.html

The JDBC driver is at #13800 and is in need of review, I have been trying to get CI and such passing

@avantgardnerio
Copy link

The JDBC driver is at #13800

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.