[WIP] [POC] Flight SQL #9368

tifflhl · 2021-01-30T02:13:23Z

Introduction
This proof-of-concept in progress illustrates how Flight SQL can be implemented. This follows designs proposed in the Flight SQL proposal socialized last year. You can find the original proposal here. Since this is still a proof-of-concept, design feedback and suggestions are very much appreciated.

An Overview of this PR

This PR adds a new module within Flight called flight-sql.
flight-sql is a new Flight API that provides a standard way for clients and servers to communicate with SQL-like semantics.
Like other Flight APIs, flight-sql does not provide implementation details that dictate how a client and server communicates with each other, it simply provides the SQL semantics and apply them onto the Flight API.

A Walkthrough of the New Module

FlightSQL.proto introduces new SQL protobuf objects.
FlightSQLProducer introduces a new FlightProducer API that adapts classic Flight requests into SQL operations.
FlighSQLExample is a sample FlightSQL server implementation.

As a proof-of-concept, this isn't complete yet. There are a few TODO items:

FlightSQLExample is not completed yet.
As of now, all flight-sql client related operations are I FlightSQLClientUtils. I would like to break this utility class apart and offer an API for a FlightSQLClient add-on.

Add extensions in the Apache Arrow project’s Arrow Flight modules to provide a standard way for clients and servers to communicate with SQL-like semantics. Do not pull to master. A message to the mailing list will accompany this and another proposal in the coming days for discussion.

Minor updates to fix field numbering and clarify usage in comments.

Update with initial formatting/naming feedback and add ResultsOrder enum.

Alter GetSQLCapabilities to model after Hive Thrift's TCLIService.

Update new message to snake_case.

Update comments to reflect current field names.

- Address code review comments from ryannicholson#2

github-actions · 2021-01-30T02:14:17Z

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

emkornfield · 2021-01-30T06:10:38Z

Small style nit: Generally camel casing even for acronyms is preferred (i.e. FlightSql.proto, there is probably inconsistency within the code base but I think most new code uses this convention)

emkornfield · 2021-01-30T06:12:35Z

format/FlightSQL.proto

+/*
+ * Wrapper for values returned in ActionGetSQLInfoResult.
+ */
+message GetSQLInfoValue {


nit: preference for Sql camel case.

emkornfield · 2021-01-30T06:13:29Z

format/FlightSQL.proto

+  oneof value {
+    string string_value = 1;
+    int32 integer_value = 2;
+    int32 integer_bitmask = 3;


Is this also modelled after ODBC? is long_bitmask not preferred?

Also if these types aren't used initially maybe leave them out for now?

A long is much more suitable for a bit mask given there are 8 bits available then.
I suggest we revisit what types can be used after we've identified a 'standardized' set of GetSqlInfo properties.

I also think it'd be better to use integral type codes instead of string values for keys.

per comments elsewhere. I don't think we should be introduce a new type system here. Lets please use the Arrow type system for returning metadata.

emkornfield · 2021-01-31T22:02:17Z

format/FlightSQL.proto

+ */
+message ActionGetTablesRequest {
+  /*
+   * Specifies the order of result values with prescendence:


Suggested change

* Specifies the order of result values with prescendence:

* Specifies the order of result values with precendence:

precedence (third time's the charm :)

emkornfield · 2021-01-31T22:05:24Z

format/FlightSQL.proto

+ */
+message ActionGetTableTypesResult {
+  /*
+   * Indicates the type of the table. E.g. table (regular data table) , view, system table etc.


Suggested change

* Indicates the type of the table. E.g. table (regular data table) , view, system table etc.

* Indicates the type of the table. E.g. table (regular data table), view, system table etc.

emkornfield · 2021-01-31T22:10:13Z

java/flight/flight-sql/src/test/protobuf/flightSQLExample.proto

@@ -0,0 +1,26 @@
+/*


nit: be consistent with file naming. Uppercase Flight

Additionally shouldn't Protobuf file names be in snake_case? See https://developers.google.com/protocol-buffers/docs/style#file_structure

I guess Flight.proto itself is already an exception...we should be consistent with what we have, though, yes (so FlightSQLExample.proto or preferably FlightSql.proto and FlightSqlExample.proto)

Yes I this is why I didn't comment on file naming

emkornfield · 2021-01-31T22:10:47Z

java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/PreparedStatementContext.java

+
+import org.apache.arrow.util.AutoCloseables;
+
+class PreparedStatementContext implements AutoCloseable {


javadoc please.

I think instead of placing this in tests, we should have a separate top level package for a JDBC reference implementation.

I agree with this. Let's create a separate module FlightSQLForJDBC or something similar.

emkornfield · 2021-01-31T22:14:52Z

java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSQLUtils.java

+   * @param scale        Scale of the type.
+   * @return The Arrow equivalent type.
+   */
+  public static ArrowType getArrowTypeFromJDBCType(int jdbcDataType, int precision, int scale) {


this duplicates code in the jdbc to arrow package i believe, can we consolidate it? I also thing there might be the need for custom mappings (we needed it JDBC conversion)

emkornfield · 2021-01-31T22:21:27Z

Thank you for the PR. I think since there isn't a fully implementation example yet for the all the features it would make this easier to review if we cut down the FlightSql proto to what is implemented and focus on each use-case in a separate PR. I think it makes sense to handle read, write and metadata operations separately?

Like other Flight APIs, flight-sql does not provide implementation details that dictate how a client and server communicates with each other, it simply provides the SQL semantics and apply them onto the Flight API.

IMO, I think we should be be trying to specify more about how client and server interact with each other in terms of error handling and retries, these are important aspects and without it I think some of the utility of standardization will be diminished.

lidavidm · 2021-02-09T14:28:20Z

format/FlightSQL.proto

+   * FLIGHT_SQL_SERVER_VERSION
+   *
+   * The Arrow version of the Flight SQL Server.
+   * FLIGHT_SQL_SERVER_ARROW_VERSION


Library version? Or format version?

IMO, RPC protocol version is what matters here from a 'capabilities-discovery' perspective, though the Arrow library release (including the commit hash) would be good as a separate field for debugging purposes.

By 'RPC protocol' do you mean the version of Flight and of FlightSQL? That does make sense.

lidavidm · 2021-02-09T14:29:23Z

format/FlightSQL.proto

+   * FLIGHT_SQL_SERVER_READ_ONLY
+   *
+   * Indicates whether the Flight SQL Server supports both read and write.
+   * FLIGHT_SQL_SERVER_READ_WRITE


These two seem redundant.

lidavidm · 2021-02-09T14:31:53Z

format/FlightSQL.proto

+  int64 record_count = 1; 
+}
+
+message ResultsOrder {


Shouldn't this message have a field?

Alternatively, if you wanted a top-level enum, you could just have

enum ResultsOrder { RESULTS_ORDER_UNKNOWN = 0; // ... }

I would also suggest prefixing enum variants with the enum name since Protobuf enums aren't namespaced.

I think this concept should be removed entirely - under what circumstances would you want to specify an order? It seems cleaner to specify an order and then allow filtering to specifics.

format/FlightSQL.proto

lidavidm · 2021-02-09T14:33:45Z

format/FlightSQL.proto

+  string table_name_filter_pattern = 4;
+
+  // Specifies a filter of table types which must match.
+  repeated string table_types = 5;


Is this an AND or an OR? Are these filters matched by substring or with the pattern syntax above?

see above for another proposal, I think this becomes more elegantly expressed with pseudotables, so we aren't trying to create arbitrary SQL semantics through protobuf. @lidavidm thoughts?

I do worry it may be a bit of an implementation burden, but maybe that's the kind of thing that the standard adapter defined here can handle. And limiting the available SQL features as you do below helps. Many databases do already expose this kind of metadata in special tables, and the client can just do the necessary conversions from Arrow data to structured data.

I guess - how do we expect these metadata values to be used? Are they in turn going to back a JDBC implementation or a SQLAlchemy connector? That might help inform what we need to expose and how. I'd also be in favor of paring this to the bare minimum and exposing additional metadata as the need becomes clear, instead of trying to map all of JDBC or ODBC in one shot.

table_types should be OR'd together (same for ODBC/JDBC). Pseudo-tables that require full SQL support may be too difficult for vendors to adopt, but if we permit implementations to support just the filters that are proposed here it reduces the burden a fair bit.

I think just submitting metadata requests via SQL has other challenges from an implementation perspective. It necessitates either having the backend validate parsing of the pseudo table (essentially making it a real table/view) or having the FlightProducer parse the query, identify the pseudo table and its filters, then translating the request to a backend call.

I think just submitting metadata requests via SQL has other challenges from an implementation perspective. It necessitates either having the backend validate parsing of the pseudo table (essentially making it a real table/view) or having the FlightProducer parse the query, identify the pseudo table and its filters, then translating the request to a backend call.

I agree it can be complicated, I think in many cases vendors already support this, and JDBC drivers end up doing this under the hood. I haven't looked deeply but I was hoping this type of parsing would not be hard with something like Calcite.

Yes, it'd be doable with Calcite. The gotchas to me are:

Having the FlightSQLProducer parse to find pseudo-tables must be optional.

We can provide a partial implementation of this that invokes Calcite's parser, but it's the implementer's responsibility to configure Calcite in a way that matches their SQL backend's syntax (eg identifier quote characters, LIKE escape sequences).

We fallback silently if the table used is a non-Flight table. There should be nothing indicating a possible error since this is the primary use case.

If the implementer uses the pre-built pseudo-table parser, its filtering and table transformation capabilities are limited to what similar ODBC/JDBC catalog functions can do (eg can't re-order columns, invoke scalar/aggregate functions or arithmetic expressions, pattern value filtering on only certain columns)

We should expand the set of meta-tables to include more than just table discovery. For parity with ODBC/JDBC applications we should support tables that report primary key constraints, foreign key relations, statistics, and table privileges. I would consider this beyond an MVP though.

lidavidm · 2021-02-09T14:38:39Z

java/flight/flight-sql/src/test/protobuf/flightSQLExample.proto

@@ -0,0 +1,26 @@
+/*


Additionally shouldn't Protobuf file names be in snake_case? See https://developers.google.com/protocol-buffers/docs/style#file_structure

lidavidm · 2021-02-09T14:39:30Z

java/flight/flight-sql/src/test/protobuf/flightSQLExample.proto

@@ -0,0 +1,26 @@
+/*


I guess Flight.proto itself is already an exception...we should be consistent with what we have, though, yes (so FlightSQLExample.proto or preferably FlightSql.proto and FlightSqlExample.proto)

lidavidm · 2021-02-09T14:43:18Z

format/FlightSQL.proto

+/*
+ * Represents an instance of executing a prepared statement. Used in the 
+ * command member of FlightDescriptor for the following RPC calls:
+ * TODO: Is the idea that a Put with parameter values would execute multiple bound versions of the prepared statement


Just a thought: you could use DoExchange to combine the steps of providing parameter values and retrieving the results.

@jduo and I were discussing, and realized that there's no allowance for output or I/O parameters. Do you happen to have any ideas about how we could integrate that into the flow without breaking anything too much? I'd hate to have to use separate calls to get the parameter data back.

For a stored procedure or something? DoExchange would let you both send and receive data on the same call. You could encode parameters into application metadata. That has the unfortunate requirement that you do separate serialization.

Or, you could have the input/output datatypes be unions of struct types, and encode both the data and the parameters into the union. This has overhead in generating the union indices, though at least you wouldn't be copying the actual data around.

Or, with the Deephaven people on the mailing list, it's been proposed that we allow a stream to update its schema midway. This could be used to send the data, then switch the schema and send the parameters.

I agree we should mostly avoid a separate call, however, this design already assumes statefulness on the part of the server, so that ship has sailed.

Agreed that if we could change schema we could have this streamed back normally, but I'm not sure how far along that is.

I think, given the current design, I'd propose adding another command that's issued after the GetFlightInfo but before the DoGet to retrieve the parameter data, if any. I think this is the last bit to work out, if we agree on this I'll update the PR with the changes to reflect the current state and then we can update the examples around that.

It 'just' needs someone to write it up and propose it. I'll have to check in on the existing thread. I would very much prefer not to require more statefulness if we don't have to, since it'll complicate implementations. (Also, if it's before DoGet, that means the server might have to execute the same query in response to either of two distinct RPC calls?)

Yes - although we could also say it's after the DoGet as well if that simplified things. I'll work on updating this exempting the output parameters then so we can get this moving meaningfully again.

Thanks. There's been some more progress on the schema evolution proposal so please feel free to chime in there.

I think the initial approach should be to exclude output parameters and then rev the FlightSQL version afterwards to include them once the schema change proposal has been finalized, so we can get this moving forward. As mentioned in the weekly status call, there is a JDBC implementation being built to take advantage of this so we want to ensure everything moves forward together.

Ah sorry if I wasn't clear - yes, that sounds good, we can add output parameters hopefully soon afterward.

lidavidm · 2021-02-09T14:46:21Z

java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSQLProducer.java

+   */
+  @Override
+  public FlightInfo getFlightInfo(CallContext context, FlightDescriptor descriptor) {
+    final Any command = FlightSQLUtils.parseOrThrow(descriptor.getCommand());


Instead of Any.pack above, what if we define an explicit top-level message that has the possible messages in a oneof?

lidavidm · 2021-02-09T14:47:48Z

java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSQLUtils.java

+  private static final boolean IS_SIGNED_FALSE = false;
+  private static final boolean IS_SIGNED_TRUE = true;
+
+  public static final ActionType FLIGHT_SQL_GETSQLINFO = new ActionType("GetSQLINFO",


nit: GetSqlInfo?

I think we should drop the SQL portion here and just have GetInfo.

emkornfield · 2021-02-09T15:44:58Z

Thinking about this a bit more I think an alternative to a lot of the metadata querying is to define a minimal sql table schema and querying capabilities for it. I'll expand more in a little bit

emkornfield · 2021-02-09T16:20:50Z

format/FlightSQL.proto

+/*
+ * Wrapper for the result of a "GetSQLInfo" action.
+ */
+message ActionGetSQLInfoResult {


Suggested change

message ActionGetSQLInfoResult {

message ActionGetSqlInfoResult {

emkornfield · 2021-02-09T16:31:50Z

format/FlightSQL.proto

+   * 1. Server Information: Provides basic information about the Flight SQL Server.
+   *
+   * The name of the Flight SQL Server.
+   * FLIGHT_SQL_SERVER_NAME


I assume these are intended to be keys in the maps below? For some of them should they be promoted to first class concepts?

Per my comment above. I think I would have a preference for moving as much as possible to sql queries. so if we defined a special table like

__APACHE_FLIGHT_SQL_INFO__ with a schema

< flight_sql_server_name: utf8 flight_sql_server_version: utf8 flight_sql_server_arrow_version: utf8 (or maybe modeled as a struct) flight_sql_info: map<utf8, dense_union<string_value: string, int_value: int32, bigint_value: int64, int32_bitmask: int32 >> >

where the int32 bitmask would also have a defined extension metadata for a custom type of bitmask.

The minimal table requirement is that it supports projection:
i.e. only select * from __APACHE_FLIGHT_SQL_INFO__ or select FLIGHT_SQL_VERSION from __APACHE_FLIGHT_SQL_INFO__

I think using pseudo tables is an interesting concept, with some interesting pros/cons worth discussing to decide on an approach.

Using pseudo tables allows for:

a simple proto definition

more expressiveness in terms of what can be done with the table, since it is then available for general SQL

allowing later additions without needing to modify any proto

Drawbacks are mostly around two cases: those where metadata is not retrieved via a query mechanism and/or where it's more efficient not to use a query mechanism, and where you need to layer an additional parser in front of the actual database parser to support the pseudo. If we add a partial parser implementation to make it easier, we are having to implement this in all the languages that Flight SQL supports.

If we use the existing proposed proto definitions, then I see the following:

less expressive in terms of what can be done

less flexibility in terms of adding new metadata calls

much less effort in that we don't need to provide a partial parser implementation to help

It's easy to go from a message to a SQL query if that's what's used for metadata, it's hard to go the other way around

I think it would be easier all around to use what's defined here (with tweaks) vs. going the SQL table route, even though the SQL table route is more elegant from a protocol perspective. The effort to provide the hand-up for those data sources which don't have any support for querying metadata would be large, and may be a hinderance to adoption.

I'll chime in and agree that while the pseudo tables are more elegant, it would make it drastically harder to support 'wrapping' existing databases without having to either modify the database implementation or pull in a SQL parser. Also, it sidesteps any concern or potential incompatibility about what subset of SQL is exactly supported.

I like the idea of standardizing on information_schema proposed elsewhere, it seems a lot of DBs already support this, IIUC it is part of the SQL standard. For those that don't support, I agree it might be a high barrier. I'm a little concerned on cargo culting on one API standard or another which seems one level removed from the data we are trying to get at.

for this particular feature, I'm not sure if information schema would supports it let me take a look more closely at this API.

Given that, I'd agree with @lidavidm that we should probably go with the initially proposed approach vs. pseudo tables.

Yes, sounds good. separately if I can find some time (probably unlikely), I'd like to see how hard it would be to make pseudo-tables work with Calcite and maybe ZetaSQL for c++.

If we're going to return data as an Arrow record batch (I agree this makes sense) we should probably make all this data fetched through DoGet or DoExchange so that we're not specifying a separate encoding of Arrow data into Flight.

We should also specify the expected schema inline here.

Updated to use the existing DoGet mechanism.

emkornfield · 2021-02-09T16:34:51Z

format/FlightSQL.proto

+ * Flight SQL enabled backend. 
+ * Requests a list of catalogs available in the server.
+ */
+message ActionGetCatalogsRequest {


In the SQL approach define a table like:
__APACHE_ARROW_FLIGHT_CATALOGS minimal schema:

<catalog_name: utf8>

Supports project and optionally sorting on catalog_name

emkornfield · 2021-02-09T16:39:14Z

format/FlightSQL.proto

+   *    - "%" means to match any substring with 0 or more characters.
+   *    - "_" means to match any one character.
+   */
+  string schema_filter_pattern = 3;


SQL first approach:
__APACHE_ARROW_FLIGHT_SCHEMAS:

catalog: utf8 schema: utf8

Minimal SQL supported. Projection, Ordering (might be limited to required catalog and schema) and LIKE or equality clauses on schema field.
i.e.

select catalog, schema from __APACHE_ARROW_FLIGHT_SCHEMAS order by catalog, schema descending where schema LIKE 'A%'

ODBC and JDBC define orderings for result sets like these, and I think it would likely be better to copy that rather than allow arbitrary ordering. I was trying to determine a use-case for when you'd want arbitrary ordering and wasn't able to think of a reasonable one for the additional effort required.

@emkornfield - rather than define new pseudo tables, would it be better to align to what's already in the INFORMATION_SCHEMA?

yes, if we can just use information_schema that seems reasonable to me.

emkornfield · 2021-02-09T16:44:20Z

format/FlightSQL.proto

+ * Flight SQL enabled backend. 
+ * Requests a list of tables available in the server.
+ */
+message ActionGetTablesRequest {


SQL first approach:
__APACHE_ARROW_FLIGHT_TABLES

< catalog: utf8 schema: utf8 table_type: utf8 # Implementations are recommended to dictionary encode this field table: utf8 serialized_flatbuffer_table_schema: bytes >

Table supports equality and like predicates on schema and table. Queries must specify catalog with an equality predicate. Support projection and ordering. Ordering might be limited to requiring field catalog, schema, table_type, table in that order.

In many cases retrieval of the schema of a table is non-trivial, but BI tools often list many tables before getting detailed metadata about one or two of the tables. This is the reason for the breakdown in ODBC/JDBC of getTables vs. getColumns. I think we should break the schemas for a table out to a separate pseudo table for efficiency.

would using information_schema combine the two?

No - it has separate tables: TABLES and COLUMNS IIRC

Should this be CommandGetTables now?

Yes - oversight.

emkornfield · 2021-02-09T16:55:50Z

Thinking about this a bit more I think an alternative to a lot of the metadata querying is to define a minimal sql table schema and querying capabilities for it. I'll expand more in a little bit

I did a rough sketch on how the metadata layer could be modeled as pseudo-tables as comments on the proto. I think this addresses some uneasiner I had with the existing model:

It uses one consistent data model for tabular data (the arrow model).
It makes query semantics clearer instead of trying to model them in an ad-hoc manner through protobufs

This comes at slightly more complex for server implementations. And potential confusion on limitations for specific tables. This might be limited to some extend if we had a different GetMetadata action so implementations know to expect a query to specific pseudo-table. I expect we would provide library wrappers over JDBC to also ease implementation.

lidavidm · 2021-02-09T16:58:29Z

This comes at slightly more complex for server implementations. And potential confusion on limitations for specific tables. This might be limited to some extend if we had a different GetMetadata action so implementations know to expect a query to specific pseudo-table. I expect we would provide library wrappers over JDBC to also ease implementation.

We don't have a GetMetadata action, but we could encode it in GetFlightInfo by sending a different command payload. That would let us cleanly namespace the special tables and the actual tables.

lidavidm · 2021-06-30T16:27:24Z

@kylepbit yes, agreed. Thanks for the updates!

I'll try and remember to approve/kick off the CI when updated but also feel free to ping me to do so.

I'll note that Micah left one comment way at the top about naming the file FlightSql.proto instead of FlightSQL.proto for consistency (I think doing that rename might lose the comments so now is probably an OK time to do it).

Specify a prefix to identify the custom metadata.

kylepbit · 2021-06-30T16:30:13Z

@lidavidm - thanks for the reminder, I'll do that since I've already fixed the casing everywhere else.

kylepbit · 2021-07-05T21:52:10Z

format/FlightSQL.proto

+ * The returned schema will be:
+ * <
+ *  info_name: utf8,
+ *  value: dense_union<string_value: string, int_value: int32, bigint_value: int64, int32_bitmask: int32>


@lidavidm - how do you actually create a FieldType for a dense union like this? I have the following:

new ArrowType.Union(UnionMode.Dense, new int[2] {ArrowTypeID.Utf8, ArrowTypeID.Int}), /*dictionary=*/null), although I could not find an ArrowTypeID for a bigint, and it seems repetitive to have the int_value and int32_bitmask.

Slightly revised, but I think still incorrect: new ArrowType.Union(UnionMode.Dense, new int[] {Type.Utf8, Type.Int})

The typeIds aren't actually Flatbuffer type IDs - they're user-chosen sentinel values used to distinguish the cases on the wire; they're the tag of a tagged union. So they can be whatever. The actual types of the children are defined by the child fields, like so:

arrow/java/vector/src/test/java/org/apache/arrow/vector/TestDenseUnionVector.java

Lines 328 to 343 in 41c4143

@Test

public void testGetFieldTypeInfo() throws Exception {

Map<String, String> metadata = new HashMap<>();

metadata.put("key1", "value1");

int[] typeIds = new int[2];

typeIds[0] = 0;

typeIds[1] = 1;

List<Field> children = new ArrayList<>();

children.add(new Field("int", FieldType.nullable(MinorType.INT.getType()), null));

children.add(new Field("varchar", FieldType.nullable(MinorType.VARCHAR.getType()), null));

final FieldType fieldType = new FieldType(false, new ArrowType.Union(UnionMode.Dense, typeIds),

/*dictionary=*/null, metadata);

final Field field = new Field("union", fieldType, children);

It took me a while to grok unions initially. The docs have a page explaining the layout: https://arrow.apache.org/docs/format/Columnar.html#dense-union

It might be clearer to think of type id as type discriminant in this case.

Thanks - I think I have it now.

Rename files for SQL -> Sql. Correct compilation errors in client code, but design needs to be updated. Tests do not yet compile.

Correct some additional SQL -> Sql file renames. Reduce the test compilation problems (still more to do).

kylepbit · 2021-07-06T20:13:50Z

@lidavidm @wesm - how should we try to move this towards getting merged - do you want one PR (this one) where the Java client/server are there, and the test has a working example, or should we try and split it up?

lidavidm · 2021-07-06T20:16:39Z

I think it's OK to have one PR with the format changes and a working client/server and test (that's how Flight itself was originally added, looking at it: db0ef22). Then we can put it up for a vote. Having the implementation will be useful so people voting can evaluate it as well.

jduo · 2021-07-06T22:11:16Z

java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlClientUtils.java

+     */
+    public FlightInfo executeQuery() throws IOException {
+      if (isClosed) {
+        throw new IOException("Prepared statement has already been closed on the server.");


Nit: This isn't an IO error and is more of an IllegalStateException.

jduo · 2021-07-06T22:15:01Z

java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlProducer.java

+    } else if (command.is(CommandGetSqlInfo.class)) {
+      getStreamSqlInfo(FlightSqlUtils.unpackOrThrow(command, CommandGetSqlInfo.class), context, ticket, listener);
+    } else {
+      throw Status.INVALID_ARGUMENT.asRuntimeException();


This needs to get set on the listener (the server can log this error though).

All calls to unpackOrThrow need to be in a try/catch that writes the result to the listener, the client will hang.

I looked at the caller, and this is set on the listener as the caller wraps this already. I think it would be redundant to do so again.

jduo · 2021-07-06T22:15:26Z

java/flight/flight-sql/src/main/java/org/apache/arrow/flight/sql/FlightSqlProducer.java

+              context, flightStream, ackStream);
+    }
+
+    throw Status.INVALID_ARGUMENT.asRuntimeException();


The listener needs the error.

I looked at the caller, and this is set on the listener as the caller wraps this already. I think it would be redundant to do so again.

jduo · 2021-07-06T22:18:16Z